Image decoding apparatus, image decoding method

ABSTRACT

A layer incapable of being decoded on a bit stream which is generated through a bit stream extraction process from a bit stream including a prescribed layer set and which only includes a layer set made up of a subset of the prescribed layer set, occurs. An aspect of the present disclosure specifies a bit stream constraint pertaining to a parameter set, and a bit stream extraction process.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.15/136,705, filed on Apr. 22, 2016, which is a continuation ofInternational Application No. PCT/JP2014/077931, filed on Oct. 21, 2014.The International Application claims priority to Japanese PatentApplication No. JP2013-231347, filed on Nov. 7, 2013 and Japanese PatentApplication No. JP2013-219443, filed on Oct. 22, 2013. All of theafore-mentioned patent applications are hereby incorporated by referencein their entireties.

TECHNICAL FIELD

The present disclosure relates to an image decoding apparatus thatdecodes hierarchically coded data where images are hierarchically coded,and to an image coding apparatus that generates hierarchically codeddata by hierarchically coding images.

BACKGROUND

Information transmitted by a communication system or informationrecorded in a storage apparatus include images or video. Conventionally,techniques for coding images (including video, hereinafter) in order totransmit and store these images have been known.

Video coding schemes, such as AVC (H.264/MPEG-4 Advanced Video Coding)and its succeeding codec HEVC (High-Efficiency Video Coding) (Non-PatentLiterature 1), have been known. (non-patent literature 1)

According to these video coding schemes, typically, a predictive imageis generated on the basis of a local decoded image obtained bycoding/decoding an input image, and a predictive residue (referred to asa “difference image” or a “residual image”), which is obtained bysubtracting the predictive image from the input image (original image),is coded. Methods of generating a predictive image include inter-screenprediction (inter prediction), and intra-screen prediction (intraprediction).

In HEVC, it is assumed that reproduction at a temporally decimated framerate, such as a case of reproducing 60 fps content at 30 fps, and atechnique of achieving temporal scalability is adopted. Morespecifically, each picture is assigned a numerical value called atemporal identifier (Temporal ID, sub-layer identifier), and aconstraint that a picture with a larger temporal identifier does notrefer to a picture with a smaller temporal identifier is imposed.Consequently, in the case of decimating pictures with a specifictemporal identifier for reproduction, pictures assigned larger temporalidentifiers are not required to be decoded.

In recent years, a scalable coding technique or a hierarchical codingtechnique, that hierarchically codes images according to a required datarate, has been proposed. SHVC (Scalable HEVC) and MV-HEVC (MultiViewHEVC) have been known as typical scalable coding schemes (hierarchicalcoding methods).

SHVC supports spatial scalability, temporal scalability, and SNRscalability. For example, in the case of the spatial scalability, animage down sampled from an original image to have a desired resolutionis coded as a lower layer. Next, on a higher layer, inter-layerprediction is performed in order to remove inter-layer redundancy(Non-Patent Literature 2).

MV-HEVC supports viewpoint scalability (view scalability). For example,in the case of coding three viewpoint images that are a viewpoint image0 (layer 0), a viewpoint image 1 (layer 1) and a viewpoint image 2(layer 2), inter-layer redundancy can be removed by predicting theviewpoint images 1 and 2 on higher layers from the viewpoint image 0 ona lower layer (layer 0) through inter-layer prediction (Non-PatentLiterature 3).

Inter-layer predictions used in scalable coding schemes, such as SHVCand MV-HEVC, include inter-layer image prediction and inter-layer motionprediction. The inter-layer image prediction generates a predictiveimage on a target layer using texture information (image) of a decodedpicture on a lower layer (or a layer different from the target layer).The inter-layer motion prediction generates a predictive value of motioninformation on the target layer using the motion information of adecoded picture on a lower layer (or a layer different from the targetlayer). That is, inter-layer prediction is performed using a decodedpicture on a lower layer (or a layer different from the target layer) asa reference picture on the target layer.

Besides the inter-layer prediction that removes redundancy in imageinformation or motion information between layers, there also isprediction between parameter sets. In order to remove the redundancy ofcoding parameters common to layers, in a parameter set (e.g., sequenceparameter set SPS, picture parameter set PPS, etc.) that defines a setof coding parameters required to decode/code coding data, the predictionbetween parameter sets predicts a part of coding parameters in theparameter set used for decoding/coding on an upper layer from amongcorresponding coding parameters in the parameter set used fordecoding/coding on a lower layer (also called reference or inheritance),and omits decoding/coding the part of coding parameters. For example,there is a technique which is notified in SPS and PPS, that predictsscaling list information (quantization matrix) on a target layer fromscaling list information on a lower layer (also called syntax predictionbetween parameter sets).

In the cases of view scalability and SNR scalability, the parameter setused for decoding/coding on each layer contains many common codingparameters. Accordingly, there is a technique called shared parameterset, which removes the redundancy of side information between layers(parameter set) using the parameter sets common to different layers. Forexample, in Non-Patent Literatures 2 and 3, SPS or PPS (the layeridentifier of parameter set has a value of nuhLayerIdA) used fordecoding/coding on the lower layer having a layer identifier value ofnuhLayerIdA is allowed to be used for decoding/coding on the higherlayer having a layer identifier value (nuhLayerIdB) higher thannuhLayerIdA. Through an NAL unit header in an NAL unit that stores codeddata of a parameter set, such as coded data on an image or codingparameter, a layer identifier for identifying a layer (also callednuh_layer_id, layerId, or lId), a temporal identifier for identifying asub-layer associating the layer (also called nuh_temporal_id_plus1,temporalId, or tId), and an NAL unit type (nal_unit_type) forrepresenting the kind of the coded data stored in the NAL unit arenotified.

In Non-Patent Literatures 2 and 3, as to a video parameter set VPS thatdefines a set of coding parameters to be referred to for decoding codeddata made up of at least one layer, there is a bit stream constraint“VPS layer identifier is set to zero (nuh_layer_id=0)”.

In Non-Patent Literature 4, as to a sequence parameter set SPS thatdefines a set of coding parameters to be referred to for decoding thetarget sequence, and a picture parameter set PPS that defines a set ofcoding parameters to be referred to for decoding each picture in thetarget sequence, there is bit stream constraint “layer identifiers ofSPS and PPS are set to zero (nuh_layer_id=0)” is proposed.

CITATION LIST Non-Patent Literature

-   Non-Patent Literature 1: “Recommendation H.265 (April 2013)”, ITU-T    (disclosed on Jun. 7, 2013)-   Non-Patent Literature 2: JCTVC-N1008_v3 “SHVC Draft 3”, Joint    Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and    ISO/IEC JTC 1/SC 29/WG 11 14th Meeting: Vienna, AT, 25 Jul.-2 Aug.    2013 (published on Aug. 20, 2013)-   Non-Patent Literature 3: JCT3V-E1008_v5 “MV-HEVC Draft Text 5”,    Joint Collaborative Team on 3D Video Coding Extension Development of    ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 5th Meeting: Vienna,    AT, 27 Jul.-2 Aug. 2013 (published on Aug. 7, 2013)-   Non-Patent Literature 4: JCTVC-O0092_v1 “MV-HEVC/SHVC HLS: On    nuh_layer_id of SPS and PPS”, Joint Collaborative Team on Video    Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11    15th Meeting: Geneva, CH, 23 Oct.-1 Nov. 2013 (published on Oct. 14,    2013)

SUMMARY Problem to be Solved

However, according to the constraint on the layer identifier pertainingto VPS, SPS and PPS in the conventional techniques (Non-PatentLiteratures 2 to 4), in the case where a bit stream includes a layer Ahaving a layer identifier value of nuhLayerIdA and a layer B having alayer identifier value of nuhLayerIdB during bit stream extraction, andwhen the coded data on the layer A is discarded and a bit streamconsisting only of the coded data on the layer B is extracted throughthe bit stream extraction, a parameter set on the layer A required todecode the layer B (the layer identifier has a value of nuhLayerIdA) maybe discarded. This case causes a problem in that the extracted codeddata on the layer B cannot be decoded.

More specifically, as shown in FIG. 1(a), it is assumed that a bitstream includes a layer set A{LayerIdList={nuh_layer_id=0,nuh_layer_id=1,nuh_layer_id=2}} made up ofa layer 0 (L #0 in FIG. 1(a)), a layer 1 (L #1 on FIG. 1(a)), and alayer 2 (L #2 on FIG. 1(a)) that have respective layer identifier valuesof L #0 (nuh_layer_id=0), L #1 (nuh_layer_id=1), and L #2(nuh_layer_id=2), respectively. In the example of the layer set A inFIG. 1(a), according to the inter-layer dependency relationship in thelayer set A, as shown in FIG. 1(a), the VCL on the layer 2 (video codinglayer) (VCL L #2 in FIG. 1(a)) depends on the VCL on the layer 1 (VCL L#1 in FIG. 1(a)) as the reference layer of inter-layer prediction(inter-layer image prediction, inter-layer motion prediction) (an uppersolid-line arrow in FIG. 1). As shown by thick arrows in FIG. 1(a),according to the reference relationship in parameter set, VPS on thelayer 0 (VPS L #0 in FIG. 1(a)) is referred to by the parameter sets oneach of the layers 0 to 2 and VCL (SPS L #0, PPS L #0, VCL L #0, SPS L#1, PPS L #1, VCL L #1 and VCL L #2 in FIG. 1(a)). The SPS on the layer0 is referred to by the PPS and VCL on the layer 0, and the PPS on thelayer 0 is referred to by the VCL on the layer 0. Likewise, the SPS onthe layer 1 is referred to by the PPS and VCL on the layer 1 and the VCLon the layer 2, and the PPS on the layer 1 is referred to by the VCL onthe layer 1 and the VCL on the layer 2.

A sub bit stream that only includes a layer set B{LayerIdListTarget={nuh_layer_id=1,nuh_layer_id=2} }, which is a subsetof the layer set A and is a decoding target, is extracted (bit streamextraction) from the bit stream including the layer set A (FIG. 1(b)).However, the parameter set (VPS (VPS L #0 on the FIG. 1(b)) having thelayer identifier value L #0 (nuh_layer_id=0) referred to for decodingthe coded data on the layers 1 and 2 in the layer set B does not existin the extracted bit stream. Consequently, a case where the coded dataon the layers 1 and 2 cannot be decoded may occur.

The present disclosure is made in view of the above problems, and has anobject to achieve an image decoding apparatus and an image codingapparatus that specify a bit stream constraint pertaining to a parameterset, and a bit stream extraction process, and prevents occurrence of alayer that cannot be decoded on a bit stream only including a layer setthat is a subset of the layer set generated from the bit streamincluding the layer set through a bit stream extraction process.

Solution to Problem

To solve the problem, an image decoding apparatus according to an aspectof the present disclosure includes: an image-coded data extractor thatextracts image coded data pertaining to a decoding target layer setincluding at least one layer, from the input image coded data, based ona layer ID list indicating the decoding target layer set; and a picturedecoding unit that decodes a picture in the decoding target layer set,from the extracted image coded data, wherein the input image coded dataextracted by the image-coded data extractor does not include a non-VCLNAL unit having a layer identifier that is not equal to zero and is notincluded in the layer ID list.

An image decoding method according to an aspect of the presentdisclosure is an image decoding method of decoding input image codeddata, including: an image-coded data extracting step of extracting imagecoded data pertaining to a decoding target layer set including at leastone layer, from the input image coded data, based on a layer ID listindicating the decoding target layer set; and a picture decoding step ofdecoding a picture in the decoding target layer set, from the extractedimage coded data, wherein the input image coded data extracted in theimage-coded data extracting step does not include a non-VCL NAL unithaving a layer identifier that is not equal to zero and is not includedin the layer ID list.

Advantageous Effects of Disclosure

An aspect of the present disclosure specifies a bit stream constraintpertaining to a parameter set, and a bit stream extraction process,which can prevent occurrence of a layer that cannot be decoded on thebit stream only including a layer set that is a subset of the layer setgenerated from the bit stream including the layer set through a bitstream extraction process.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram for illustrating an example of a problem occurringwhen a layer set B that is a subset of a certain layer set A isextracted from a bit stream including the layer set A, FIG. 1(a) showsan example of the layer set A, and FIG. 1(b) shows an example of thelayer set B after bit stream extraction;

FIG. 2 is a diagram for illustrating the layer structure ofhierarchically coded data according to an embodiment of the presentdisclosure, FIG. 2(a) shows a hierarchical video coding apparatus side,and FIG. 2(b) shows a hierarchical video decoding apparatus side;

FIG. 3 is a diagram for illustrating layers constituting a certain layerset, and a structure of a sub-layer (temporal layer);

FIG. 4 is a diagram for illustrating layers and sub-layers (temporallayers) that constitute a subset of a layer set extracted by asub-bit-stream extracting process from the layer set shown in FIG. 3;

FIG. 5 is a diagram showing an example of the data structure thatconfigures an NAL unit layer;

FIG. 6 is a diagram showing an example of a syntax included in the NALunit layer, FIG. 6(a) shows an example of the syntax configuring the NALunit layer, and FIG. 6(b) shows an example of the syntax of an NAL unitheader;

FIG. 7 is a diagram showing the relationship of a value of the NAL unittype and the kind of the NAL unit according to the embodiment of thepresent disclosure;

FIG. 8 is a diagram showing an example of a configuration of the NALunit included in an access unit;

FIG. 9 is a diagram for illustrating a configuration of hierarchicallycoded data according to the embodiment of the present disclosure, FIG.9(a) is a diagram showing a sequence layer defining a sequence SEQ, FIG.9(b) is a diagram showing a picture layer defining a picture PICT, FIG.9(c) is a diagram showing a slice layer defining a slice S, FIG. 9(d) isa diagram showing a slice data layer defining slice data, FIG. 9(e) is adiagram showing a coding tree layer defining a coded tree unit includedin the slice data, and FIG. 9(f) is a diagram showing a coding unitlayer defining a coding unit (Coding Unit; CU) included in the codingtree;

FIG. 10 is a diagram for illustrating a shared parameter set accordingto this embodiment;

FIG. 11 is a diagram for illustrating reference picture lists andreference pictures, FIG. 11(a) is a diagram schematically showingexamples of the reference picture lists, and FIG. 11(b) is a diagramschematically showing examples of reference pictures;

FIG. 12 shows an example of a VPS syntax table according to theembodiment of the present disclosure;

FIG. 13 shows an example of a VPS extension data syntax table accordingto the embodiment of the present disclosure;

FIG. 14 is a diagram showing a layer dependency type according to thisembodiment, FIG. 14(a) shows an example of including, as a dependencytype, the presence or absence of non-VCL dependency, and FIG. 14(b)shows an example of including, as a dependency type, the presence orabsence of shared parameter set, and the presence or absence ofprediction between parameter sets;

FIG. 15 shows an example of an SPS syntax table according to theembodiment of the present disclosure;

FIG. 16 shows an example of an SPS extension data syntax table accordingto the embodiment of the present disclosure;

FIG. 17 shows an example of a PPS syntax table according to theembodiment of the present disclosure;

FIG. 18 shows an example of a slice layer syntax table according to theembodiment of the present disclosure, FIG. 18(a) shows an example of asyntax table of the slice header included in the slice layer, and theslice data, FIG. 18(b) shows an example of a syntax table of the sliceheader, and FIG. 18(c) shows an example of a syntax table of the slicedata;

FIG. 19 is a diagram schematically showing a configuration of ahierarchical video decoding apparatus according to this embodiment;

FIG. 20 is a schematic diagram showing a configuration of a target layerset picture decoding unit according to this embodiment;

FIG. 21 is a flowchart for illustrating the operation of a picturedecoding unit according to this embodiment;

FIG. 22 is a diagram schematically showing the configuration of thehierarchical video decoding apparatus according to this embodiment;

FIG. 23 is a schematic diagram showing a configuration of a target layerset picture decoding unit according to this embodiment;

FIG. 24 is a flowchart for illustrating the operation of a picturedecoding unit according to this embodiment;

FIG. 25 is a diagram showing configurations of a transmitting apparatusmounted with the hierarchical video coding apparatus, and a receivingapparatus mounted with the hierarchical video decoding apparatus, FIG.25(a) shows the transmitting apparatus mounted with the hierarchicalvideo coding apparatus, and FIG. 25(b) shows the receiving apparatusmounted with the hierarchical video decoding apparatus;

FIG. 26 is a diagram showing configurations of a recording apparatusmounted with the hierarchical video coding apparatus, and a reproducingapparatus mounted with the hierarchical video decoding apparatus, FIG.26(a) shows the recording apparatus mounted with the hierarchical videocoding apparatus, and FIG. 26(b) shows the reproducing apparatus mountedwith the hierarchical video decoding apparatus;

FIG. 27 is a flowchart for illustrating the operation of a bit streamextraction unit according to this embodiment;

FIG. 28 is a flowchart for illustrating the operation of VariationExample 1 of the bit stream extraction unit according to thisembodiment;

FIG. 29 is a flowchart for illustrating the operation of VariationExample 2 of the bit stream extraction unit according to thisembodiment;

FIG. 30 is a flowchart for illustrating the operation of VariationExample 3 of the bit stream extraction unit according to thisembodiment; and

FIG. 31 is a diagram for illustrating indirect reference layers.

DESCRIPTION OF EMBODIMENTS

Referring to FIGS. 2 to 31, a hierarchical video decoding apparatus 1and a hierarchical video coding apparatus 2 according to an embodimentof the present disclosure are described as follows.

[Overview]

The hierarchical video decoding apparatus (image decoding apparatus) 1according to this embodiment decodes coded data hierarchically coded bythe hierarchical video coding apparatus (image coding apparatus) 2. Thehierarchical coding is a coding scheme that codes video hierarchicallyfrom a low-quality component to a high-quality component. Thehierarchical coding is standardized in, for example, SVC and SHVC. Here,the quality of video broadly means elements that have effects on theappearance of video in subjective and objective viewpoints. The qualityof video includes, for example, “resolution”, “frame rate”, “imagequality”, and “pixel representation accuracy”. Consequently, differencein video quality hereinafter indicates difference in “resolution” etc.in an exemplary manner. However, the difference is not limited thereto.For example, also in the case of that video is quantized in differentquantization steps (i.e., the case that video is coded with differentcoding noise), the video quality can be regarded to be different fromeach other.

The hierarchical coding technique may be classified into (1) spatialscalability (2) temporal scalability (3) SNR (Signal to Noise Ratio)scalability, and (4) view scalability, in view of types of hierarchizedinformation. The spatial scalability is a technique of hierarchizationaccording to the resolution and the size of an image. The temporalscalability is a technique of hierarchization according to a frame rate(the number of frames per unit time). The SNR scalability is a techniqueof hierarchization according to coding noise. The view scalability is atechnique of hierarchization according to viewpoint positions associatedwith respective images.

Prior to detailed description on the hierarchical video coding apparatus2 and the hierarchical video decoding apparatus 1 according to thisembodiment, (1) the layer structure of hierarchically coded datagenerated by the hierarchical video coding apparatus 2 and decoded bythe hierarchical video decoding apparatus 1 is described, andsubsequently (2) a specific example of data structures that can beadopted in respective layers is described.

[Layer Structure of Hierarchically Coded Data]

Here, referring to FIG. 2, coding and decoding of hierarchically codeddata are described as follows. FIG. 2 is a diagram schematically showingthe case of hierarchically coding/decoding video in three hierarchicallayers of a lower hierarchical layer L3, a medium hierarchical layer L2,and a higher hierarchical layer L1. That is, in an example shown inFIGS. 2(a) and 2(b), among the three hierarchical layers, the higherhierarchical layer L1 is the highest layer, and the lower hierarchicallayer L3 is the lowest layer.

A decoded image corresponding to a specific quality that can be decodedfrom the hierarchically coded data is hereinafter called a decoded imageon a specific hierarchical layer (or a decoded image corresponding to aspecific hierarchical layer) (e.g., a decoded image POUT # A on thehigher hierarchical layer L1).

FIG. 2(a) shows hierarchical video coding apparatus 2# A to 2# C thatgenerate coded data DATA # A to DATA # C by hierarchically codingrespective input images PIN # A to PIN # C. FIG. 2(b) shows hierarchicalvideo decoding apparatus 1# A to 1# C that generate decoded images POUT# A to POUT # C by decoding respective coded data DATA # A to DATA # Chaving been hierarchically coded.

First, referring to FIG. 2(a), the coding apparatus side is described.The input images PIN # A, PIN # B and PIN # C, which are to be inputs onthe coding apparatus side, have been originated from the same image, butare different in image quality (resolution, frame rate, image quality,etc.). The image quality becomes lower in an order of the input imagesPIN # A, PIN # B and PIN # C.

The hierarchical video coding apparatus 2# C on the lower hierarchicallayer L3 codes the input image PIN # C on the lower hierarchical layerL3 to generate the coded data DATA # C on the lower hierarchical layerL3. Basic information required for decoding to obtain the decoded imagePOUT # C on the lower hierarchical layer L3 is included (indicated as“C” in FIG. 2). The lower hierarchical layer L3 is the lowesthierarchical layer. Consequently, the coded data DATA # C on the lowerhierarchical layer L3 is also called basic coded data.

The hierarchical video coding apparatus 2# B on the medium hierarchicallayer L2 codes the input image PIN # B on the medium hierarchical layerL2 to generate the coded data DATA # B on the medium hierarchical layerL2 with reference to the coded data DATA # C on the lower hierarchicallayer. The coded data DATA # B on the medium hierarchical layer L2includes not only the basic information “C” included in the coded dataDATA # C but also additional information (indicated as “B” in FIG. 2)required for decoding to obtain the decoded image POUT # B on the mediumhierarchical layer.

The hierarchical video coding apparatus 2# A on the higher hierarchicallayer L1 codes the input image PIN # A on the higher hierarchical layerL1 to generate the coded data DATA # A on the higher hierarchical layerL1 with reference to the coded data DATA # B on the medium layer L2. Thecoded data DATA # A on the higher hierarchical layer L1 includes notonly the basic information “C” required for decoding to obtain thedecoded image POUT # C on the lower hierarchical layer L3 and theadditional information “B” required for decoding to obtain the decodedimage POUT # B on the medium hierarchical layer L2, but also additionalinformation (indicated as “A” in FIG. 2) required for decoding to obtainthe decoded image POUT # A on the higher hierarchical layer.

As described above, the coded data DATA # A on the higher hierarchicallayer L1 includes information pertaining to the decoded images withvariable qualities.

Now, referring to FIG. 2(b), the decoding apparatus side is described.On the decoding apparatus side, the decoding apparatus 1# A, 1# B and 1#C, which correspond to the higher hierarchical layer L1, the mediumhierarchical layer L2 and the lower hierarchical layer L3, respectively,decode the coded data DATA # A, DATA # B and DATA # C to output thedecoded images POUT # A, POUT # B and POUT # C.

Video with a specific quality can be reproduced by extracting a part ofinformation on the higher hierarchically coded data (also called bitstream extraction) and by decoding the extracted information in aspecific decoding apparatus on a lower level.

For example, the hierarchical decoding apparatus 1# B on the mediumhierarchical layer L2 may extract the information required for decodingto obtain the decoded image POUT # B (i.e., “B” and “C” included in thehierarchically coded data DATA # A) from the hierarchically coded dataDATA # A on the higher hierarchical layer L1, and perform decoding toobtain the decoded image POUT # B. In other words, on the decodingapparatus side, the decoded images POUT # A, POUT # B and POUT # C canbe obtained through decoding, on the basis of the information includedin the hierarchically coded data DATA # A on the higher hierarchicallayer L1.

The hierarchically coded data is not limited to thethree-hierarchical-layered data described above. Alternatively, thehierarchically coded data may be hierarchically coded in twohierarchical layers, or hierarchically coded in layers that are morethan three hierarchical layers.

A part or the entire coded data pertaining to the decoded image on aspecific hierarchical layer may be coded independently of the otherhierarchical layers to configure the hierarchically coded data so as tonegate the need to refer to information on the other hierarchical layersduring decoding on the specific hierarchical layer. For example, thedescription has been made such that in the example described above withreference to FIGS. 2(a) and 2(b), “C” and “B” are referred to fordecoding to obtain the decoded image POUT # B. However, the reference isnot limited thereto. The hierarchically coded data can be configured soas to allow the decoded image POUT # B to be obtained through decodingonly use of “B”. For example, a hierarchical video decoding apparatuscan be configured that receives, as input, hierarchically coded dataconsisting only of “B” and the decoded image POUT # C, for decoding toobtain the decoded image POUT # B.

In the case of achieving the SNR scalability, the hierarchically codeddata may be generated so that even when the same original image is usedas input images PIN # A, PIN # B and PIN # C, and the decoded imagesPOUT # A, POUT # B and POUT # C have different image qualities. In thiscase, the hierarchical video coding apparatus on the lower hierarchicallayer generates the hierarchically coded data by quantizing thepredictive residue using a larger quantization width than thehierarchical video coding apparatus on the higher hierarchical layerdoes.

In this description, for the sake of illustration, terms are defined asfollows. The following terms are used to represent technical issuesdescribed below if not otherwise specified.

VCL NAL unit: VCL (Video Coding Layer) NAL unit is an NAL unitcontaining video (video signal) coded data. For example, the VCL NALunit contains slice data (CTU coded data), and header information (sliceheader) commonly used through decoding the slice. Coded data stored inthe VCL NAL unit is called VCL.

non-VCL NAL unit: non-VCL (non-Video Coding Layer, non-VCL) NAL unit isan NAL unit that contains coded data, such as header information, whichis a set of coding parameters used for decoding to obtain sequences andpictures, such as a video parameter set VPS, sequence parameter set SPS,and picture parameter set PPS. Coded data stored in the non-VCL NAL unitis called non-VCL.

Layer identifier: a layer identifier (also called layer ID) is foridentifying the hierarchical layer (layer), and corresponds to thehierarchical layer on a one-to-one basis. The hierarchically coded datacontains an identifier used to select partially coded data required fordecoding to obtain a decoded image on a specific hierarchical layer. Asubset of hierarchically coded data associated with a layer identifiercorresponding to a specific layer is also called layer representation.

Typically, for the sake of decoding to obtain a decoded image on aspecific hierarchical layer, the layer representation on thehierarchical layer and/or the layer representation corresponding to thelower layer of the hierarchical layer concerned are used. That is, forthe sake of decoding to obtain the decoded image on the target layer,the layer representation on the target layer and/or the layerrepresentation on at least one hierarchical layer included in the lowerlayer of the target layer are used.

Layer: a set of VCL NAL units having the value of layer identifier onthe specific hierarchical layer (nuh_layer_id,nuhLayerId) and non-VCLNAL units associated with the VCL NAL units, or a set of syntaxstructures having hierarchical relationship.

Higher layer: a hierarchical layer disposed higher than a certainhierarchical layer is called a higher layer. For example, in FIG. 2, thehigher layers of the lower hierarchical layer L3 are the mediumhierarchical layer L2 and the higher hierarchical layer L1. A decodedimage on the higher layer means a decoded image with a higher quality(e.g., high resolution, high frame rate, high image quality, etc.).

Lower layer: a hierarchical layer disposed lower than a certainhierarchical layer is called a lower layer. For example, in FIG. 2, thelower layers of the higher hierarchical layer L1 are the mediumhierarchical layer L2 and the lower hierarchical layer L3. The decodedimage on the lower layer means a decoded image with a lower quality.

Target layer: a hierarchical layer that is a target of decoding orcoding. A decoded image corresponding to the target layer is called atarget layer picture. Pixels constituting the target layer picture arereferred to as target layer pixels.

Reference layer: a specific lower layer to be referred to for decodingto obtain the decoded image corresponding to the target layer is calleda reference layer. The decoded image corresponding to the referencelayer is called a reference layer picture. Pixels constituting thereference layer are referred to as reference layer pixels.

In the example shown in FIGS. 2(a) and 2(b), the reference layers of thehigher hierarchical layer L1 are the medium hierarchical layer L2 andthe lower hierarchical layer L3. However, the configuration is notlimited thereto. Alternatively, the hierarchically coded data may beconfigured so as to negate the need to refer to all the lower layersduring decoding to obtain the specific layer. For example, thehierarchically coded data may be configured for the reference layer ofthe higher hierarchical layer L1 to be any of the medium hierarchicallayer L2 and the lower hierarchical layer L3. The reference layer can berepresented as a layer different from the target layer that is used(referred to) to predict the coding parameter and the like used todecode the target layer. A reference layer directly referred to duringinter-layer prediction on the target layer is also called directreference layer. The direct reference layer B, which is referred to ininter-layer prediction on the direct reference layer A of the targetlayer, is also called an indirect reference layer of the target layer.

Basic layer: the hierarchical layer arranged on the lowest layer iscalled a basic layer. A decoded image on the basic layer is a decodedimage into which the coded data is decoded and which has the lowestquality, and is called a basic decoded image. In other words, the basicdecoded image is a decoded image corresponding to the lowesthierarchical layer. A partially coded data item of the hierarchicallycoded data required for decoding to obtain the basic decoded image iscalled basic coded data. For example, basic information “C” contained inhierarchically coded data DATA # A on the higher hierarchical layer L1is the basic coded data.

Extended layer: a higher layer of the basic layer is called an extendedlayer.

Inter-layer prediction: the inter-layer prediction is prediction of thesyntax element value on the target value, a coding parameter used todecode the target layer and the like, on the basis of the syntax elementvalue included in the layer representation on the hierarchical layer(reference layer) different from the layer representation on the targetlayer, of a value derived from the syntax element value and of thedecoded image. The inter-layer prediction that predicts informationpertaining to the motion information from the information on thereference layer may be called inter-layer motion information prediction.The inter-layer prediction from the decoded image on the lower layer maybe called inter-layer image prediction (or inter-layer textureprediction). The hierarchical layer used for inter-layer prediction isexemplified as the lower layer of the target layer. Prediction in thetarget layer without using the reference layer may be called intra-layerprediction.

Temporal identifier: a temporal identifier (also called a temporal ID,time identifier, sub-layer ID or sub-layer identifier) is an identifierfor identifying the layer pertaining to temporal scalability(hereinafter called sub-layer). The temporal identifier is foridentifying a sub-layer, and corresponds to the sub-layer on aone-to-one basis. The coded data contains a temporal identifier used toselect partially coded data required for decoding to obtain a decodedimage on a specific sub-layer. In particular, the temporal identifier onthe highest (uppermost) sub-layer is called the highest (uppermost)temporal identifier (highest TemporalId, highestTid).

Sub-layer: a sub-layer is a layer pertaining to temporal scalabilityidentified by the temporal identifier. For the sake of discriminationfrom other scalabilities, such as spatial scalability and SNRscalability, this is hereinafter called a sub-layer (also calledtemporal layer). It is hereinafter assumed that the temporal scalabilityis achieved by a sub-layer contained in the coded data on the basiclayer, or the hierarchically coded data required for decoding on acertain layer.

Layer set: a layer set is a set of layers that include at least onelayer.

Bit stream extraction process: a bit stream extraction process is aprocess that removes (discards) NAL units that are not contained in aset (called a TargetSet) defined by the target highest temporalidentifier (highest TemporalId, highestTid) and a layer ID listindicating layers contained in the target layer set (also calledLayerSetLayerIdList[ ], LayerIdList[ ]), from a certain bit stream(hierarchically coded data, coded data), and extracts a bit stream (alsocalled a sub-bit-stream) including NAL units contained in the targetset. The bit stream extraction is also called sub-bit-stream extraction.It is assumed that the layer IDs contained in the layer set are storedin the respective elements of the layer ID list LayerSetLayerIdList[K](K=0 . . . N−1, N is the number of layers contained in the layer set) inan ascending order. The target highest temporal identifier is alsocalled HighestTidTarget, the target layer set is also calledLayerSetTarget, and the layer ID list of the target layer set (targetlayer ID list) is also called LayerIdListTarget. The bit stream (imagecoded data) that is generated through bit stream extraction and includesthe NAL units contained in the target set is also called decoding targetimage coded data (BitsreamToDecode).

Next, referring to FIGS. 3 and 4, an example is described where the bitstream extraction process extracts, from the hierarchically coded datacontaining a certain layer set A, the hierarchically coded datacontaining a layer set B, which is a subset of the layer set A.

FIG. 3 shows the configuration of the layer set A that includes threelayers (L #0, L #1 and L #2) each made up of three sub-layers (TID1,TID2 and TID3). The layers that constitute a layer set, and sub-layersare hereinafter represented as {layer ID list {L #0, . . . , L # N},highest temporal ID=K}, or {LayerIdList={L #0, . . . , L # N},HighestTid=K}. For example, the layer set A in FIG. 3 is represented as{layer ID list {L #0, . . . , L #1, L #2}, highest temporal ID=3}, or{LayerIdList={L #0,L #1,L #2}, HighestTid=3}. Here, the symbol L # Nindicates a certain layer N, boxes in FIG. 3 each indicate a picture,and the numbers in respective boxes indicate an example of a decodingorder. Hereinafter, a picture with a number N is represented as P # N,which is substantially applicable to FIG. 4.

Arrows between pictures represent dependence directions between pictures(reference relationship). Arrows in the same layer represent referencepictures used for inter prediction. Arrows between layers representreference pictures used for inter-layer prediction (also calledreference layer pictures).

AU in FIG. 3 represents an access unit, and a symbol # N indicates theaccess unit number. Provided that AU at a certain starting point (e.g.,the start point of random access) is represented as AU #0, AU # Nrepresents the (N−1)-th access unit, and represents the order of AUcontained in the bit stream. That is, in the example of FIG. 3, theaccess units are stored on the bit stream in an order of AU #0, AU #1,AU #2, AU #3, AU #4 . . . . The access unit represents a set of NALunits complied according to a specific classification rule. AU #0 inFIG. 3 can be regarded as a set of VCL NALs containing coded data ofpictures P #1, P #1 and P #3. The details of the access unit aredescribed later.

In the example of FIG. 3, the target set (layer set B) is{LayerIdList={L #0, L #1}, HighestTid=2}. Consequently, layers that arenot contained in the target set (layer set B), and sub-layers havingvalues higher than the highest temporal ID (HighestTid=2) are discardedfrom the bit stream containing the layer set A through bit streamextraction. That is, the layer L #2 and the sub-layer (TID3) that arenot included in the layer ID list are discarded. Finally, as shown inFIG. 4, the bit stream containing the layer set B is extracted. In FIG.4, broken-line boxes represent discarded pictures. Broken-line arrowsindicate the dependence directions between the discarded pictures andthe reference pictures. As the NAL units constituting the pictures onthe layer L #3 and sub-layer TID3 have already been discarded, thedependency relationship has already been broken off.

SHVC and MV-HEVC adopt concepts of layers and sub-layers in order toachieve the SNR scalability, the spatial scalability, the temporalscalability and the like. As described in FIGS. 3 and 4, in the case ofchanging the frame rate to achieve the temporal scalability, coded dataof pictures (highest temporal ID (TID3)) which are not referred to byanother picture are discarded through the bit stream extraction process.In the case of FIGS. 3 and 4, the coded data of the pictures (10, 13,11, 14, 12 and 15) are discarded, thereby generating coded data at aframe rate of 1/2.

In the case of achieving the SNR scalability, the spatial scalabilityand the view scalability, the coded data on the layer not contained inthe target set are discarded through bit stream extraction, thereby thegranularity of each scalability is changed. In the case of FIGS. 3 and4, the coded data of the pictures (3, 6, 9, 12 and 15) are discarded,thereby generating coded data with a rough granularity of scalability.Repetition of the above process can gradually adjust the granularitiesof layers and sub-layers.

In FIG. 3, if there is no dependency relationship such as inter-layerprediction or the like between layers L #0 and L #1, the bit streamcontaining the layer set C{LayerIdList={L #1}, HighestTid=2} that is asubset of the layer set A consisting of a single layer can be extractedfrom the bit stream of the layer set A through bit stream extraction.

The terms described above are used for the sake of convenience fordescription. Consequently, the above technical matters may berepresented in other terms.

[Data Structure of Hierarchically Coded Data]

The case of using HEVC and its extended scheme is exemplified as thecoding scheme for generating coded data on the respective hierarchicallayers. However, the example is not limited thereto. Alternatively, thecoded data on the respective hierarchical layers may be generatedaccording to a coding scheme, such as MPEG-2 or H.264/AVC.

The lower layer and the higher layer may be coded according to differentcoding schemes. The coded data on the respective hierarchical layers maybe supplied to the hierarchical video decoding apparatus 1 throughtransmission paths different from each other, or to the hierarchicalvideo decoding apparatus 1 through the same transmission path.

For example, in the case of scalable-coding ultrahigh-definition video(video, 4K video data) through the basic layer and one extended layerand transmitting the coded video, the basic layer may code video dataobtained by downscaling and interlacing 4K video data, according toMPEG-2 or H.264/AVC, transmit the coded data through a televisionbroadcasting network, while the extended layer may code 4K video(progressive) through HEVC and transmit the coded video via theInternet.

<Structure of Hierarchically Coded Data DATA>

Prior to detailed description on the image coding apparatus 2 and theimage decoding apparatus 1 according to this embodiment, the datastructure of hierarchically coded data DATA, which is generated by theimage coding apparatus 2 and decoded by the image decoding apparatus 1,are described.

(NAL Unit Layer)

FIG. 5 is a diagram showing the hierarchical layer data structure ofhierarchically coded data DATA. The hierarchically coded data DATA iscoded on the basis of a unit called NAL (Network Abstraction Layer)unit.

NAL is a layer provided to abstract communication between a VCL (VideoCoding Layer) that is a layer for performing a video coding process, anda lower system for transmitting and accumulating the coded data.

VCL is a layer for performing an image coding process. On VCL, coding isperformed. Meanwhile, the so called a lower system corresponds to thefile formats of H.264/AVC and HEVC, and the MPEG-2 system etc. In anexample described below, the lower system corresponds to decodingprocesses on the target layer and the reference layer. In NAL, the bitstream generated on VCL is delimited in units that are NAL units, andtransmitted to the lower system, which is the destination.

FIG. 6(a) shows a syntax table of an NAL (Network Abstraction Layer)unit. The NAL unit includes a coded data that is coded on VCL, and aheader (NAL unit header: nal_unit header( ) (SYNNAL01 in FIG. 6)) forallowing the coded data to be appropriately delivered to the lowersystem, which is the destination. The NAL unit header is, for example,represented according to the syntax shown in FIG. 6(b). In the NAL unitheader there are described “nal_unit_type” that represents the type ofthe coded data stored in the NAL unit, “nuh_temporal_id plus1” thatrepresents the identifier (temporal identifier) of the sub-layer towhich the stored coded data belongs, and “nuh_layer_id” (ornuh_reserved_zero_6bits) that represents the identifier (layeridentifier) of the layer to which the stored coded data belongs.Meanwhile, the NAL unit data (SYNNAL02 in FIG. 6) includes a parameterset, SEI, slice, etc., which are described later.

FIG. 7 is a diagram showing the relationship of the value of the NALunit type and the kind of the NAL unit. As shown in FIG. 7, NAL unitshaving NAL unit types with values ranging from 0 to 15 indicated bySYNA101 are slices of non-RAP (random access picture). NAL units havingNAL unit types with values ranging from 16 to 21 indicated by SYNA102are slices of RAP (random access point picture, IRAP picture). The RAPpictures are roughly classified into BLA pictures, IDR pictures, and CRApictures. The BLA pictures are further classified into BLA_W_LP,BLA_W_DLP and BLA_N_LP. The IDR pictures are further classified intoIDR_W_DLP and IDR_N_LP. Pictures other than the RAP pictures includeleading pictures (LP pictures), temporal access pictures (TSA pictures,STSA pictures), and trailing pictures (TRAIL pictures). The coded dataon each hierarchical layer is stored in the NAL unit to thus beNAL-multiplexed, and is transmitted to the hierarchical video decodingapparatus 1.

As shown in NAL Unit Type Class in FIG. 7, the NAL units are classifiedinto data (VCL data) constituting pictures and other data (non-VCL),according to the NAL unit type. All the pictures are classified into theVCL NAL units regardless of the kind of picture, such as random accesspictures, leading pictures, and trailing pictures. A parameter set thatis data required for decoding to obtain the picture, SEI that isauxiliary information on the picture, an access unit delimiter (AUD)that indicates delimitations for the sequence, end of sequence (EOS), anend of bit stream (EOB) (SYNA103 in FIG. 7) and the like are classifiedinto non-VCL NAL units.

(Access Unit)

The set of NAL units complied according to the specific classificationrule is called an access unit. In the case where the number of layers isone, the access unit is a set of NAL units configuring one picture. Inthe case where the number of layers is more than one, the access unit isa set of NAL units configuring pictures on multiple layers at the sametime. In order to indicate the delimitation between access units, thecoded data may include an NAL unit that is called an access unitdelimiter (Access unit delimiter). The access unit delimiter is includedbetween a set of NAL units configuring a certain access unit in thecoded data, and a set of NAL units configuring another access unit.

FIG. 8 is a diagram showing an example of the configuration of the NALunit included in the access unit. As shown in FIG. 8, AU includes NALunits, such as an access unit delimiter (AUD) that shows the leadingposition of AU, various parameter sets (VPS, SPS, and PPS), various SEIs(Prefix SEI, and Suffix SEI), VCL (slice) constituting one picture inthe case where the number of layers is one, VCL constituting pictures asmany as the number of layers in the case where the number of layers ismore than one, EOS (End of Sequence) showing the end of a sequence, andEOB (End of Bitstream) showing the end of a bit stream. In FIG. 8, codesL # K(K=Nmin . . . Nmax) after VPS, SPS, SEI, and VCL represent layerIDs. According to the example of FIG. 8, SPS, PPS, SEI and VCL of eachof the layers L # Nmin to L # Nmax, except VPS, are in AU in anascending order of the layer ID. The VPS is transmitted only throughthat with the lowest layer ID. FIG. 8 shows whether a specific NAL unitis present in AU or repetitively present, by means of arrows. Forexample, the case of presence of the specific NAL unit in AU isindicated by an arrow passing through the NAL unit. The case of absenceof the specific NAL unit in AU is indicated by an arrow skipping thisNAL unit. For example, an arrow that does not pass through AUD but istoward VPS indicates the case of absence of AUD in AU. Although the VPShaving a higher layer ID other than the lowest one may be included inAU, the image decoding apparatus ignores the VPS having the layer IDother than the lowest one. Various parameter sets (VPS, SPS and PPS),and SEI, which is auxiliary information, may be included as a part ofthe access unit as shown in FIG. 8, or transmitted to a decoder throughwhat is different from the bit stream.

In particular, an access unit including an IRAP picture with a layeridentifier nuhLayerId=0 is called an IRAP access unit (random accesspoint access unit). An IRAP access unit that performs initialization ofa process of decoding all the layers included in the decoding targetlayer set is called an initialization TRAP access unit. According to thedecoding order, a set ranging from the initialization TRAP access unitand through a non-initialization TRAP access unit with at least zero(access unit other than the initialization IRAP access unit) and to thenext initialization TRAP access unit (note that the next initializationTRAP access unit is excluded) is also called CVS (Coded Video Sequence;hereinafter, including sequence SEQ).

FIG. 9 is a diagram showing the hierarchical layer data structure ofhierarchically coded data DATA. The hierarchically coded data DATAincludes a sequence and multiple pictures configuring the sequence in anexemplary manner. FIGS. 9(a) to 9(f) are diagrams showing a sequencelayer defining a sequence SEQ, a picture layer defining a picture PICT,a slice layer defining a slice S, a slice data layer defining slicedata, a coding tree layer defining a coded tree unit included in theslice data, and a coding unit layer defining a coding unit (Coding Unit;CU) included in the coding tree, respectively.

(Sequence Layer)

The sequence layer defines a set of data referred to by the imagedecoding apparatus 1 to decode the processing target sequence SEQ(hereinafter, also called a target sequence). As shown in FIG. 9(a), thesequence SEQ contains a video parameter set, a sequence parameter set(SPS), a picture parameter set (PPS), a picture PICT, and supplementalenhancement information (SEI). Here, the value shown after # indicatesthe layer ID. FIG. 9 shows an example of presence of the coded datahaving #0 and #1, i.e., the layer ID of zero and the layer ID of one.However, the kinds of layers and the number of layers are not limitedthereto.

(Video Parameter Set)

The video parameter set VPS defines a set of coding parameters referredto by the image decoding apparatus 1 for decoding the coded data made upof at least one layer. For example, an VPS identifier(video_parameter_set_id) used for identifying the VPS referred to by theafter-mentioned sequence parameter set and another syntax element, thenumber of layers included in the coded data (vps_max_layers_minus1), thenumber of sub-layers included in the layer (vps_sub_layers_minus1), thenumber of layer sets (vps_num_layer_sets_minus1) defining the set oflayers including at least one layer represented in the coded data, layerset configuration information (layer_id_included_flag[i][j]) definingthe set of layers constituting the layer set, inter-layer dependencyrelationship (direct dependency flag direct_dependency_flag[i][j], layerdependency type direct_dependency_type[i][j]) and the like are defined.Multiple VPSs may be in the coded data. In this case, a VPS used fordecoding is selected from among the VPSs for each target sequence. TheVPS used for decoding to obtain a specific sequence belonging to acertain layer is called an active VPS. VPSs applied to the basic layerand the extended layer may be discriminated, and the VPS for the basiclayer (layer ID=0) may be called an active VPS while the VPS for theextended layer (layer ID>0) may be called an active layer VPS.Hereinafter, the VPS means the active VPS for the target sequencebelonging to a certain layer, if not otherwise specified. The VPS thathas the layer ID=nuhLayerIdA and is used to decode the layer having thelayer ID=nuhLayerIdA may be used for decoding the layer (nuhLayerIdB,nuhLayerIdB>nuhLayerIdA) with a layer ID higher than nuhLayerIdA.

(Bit Stream Constraint Pertaining to VPS)

As to VPS, a bit stream constraint (also called bit stream conformance)“the VPS has the same layer ID as the layer ID that is the lowest layerID in the VCL included in the access unit, and the temporal ID is zero(tId=0)” is assumed between the decoder and the encoder. Here, the bitstream conformance is a constraint required to be satisfied by a bitstream decoded by the hierarchical video decoding apparatus (here, thehierarchical video decoding apparatus according to the embodiment of thepresent disclosure). Likewise, as to a bit stream generated by thehierarchical video coding apparatus (here, the hierarchical video codingapparatus according to the embodiment of the present disclosure), thebit stream conformance is required to be satisfied to securely allow thebit stream to be decoded by the hierarchical video decoding apparatus.That is, in consideration of the bit stream conformance, the bit streamis required to satisfy at least the following condition CX1.

CX1: “in the case where the VPS with the layer identifier nuhLayerIdA isan active VPS of the layer with the layer identifiernuhLayerIdB(nuhLayerIdB>=nuhLayerIdA), the layer identifier nuhLayerIdAis the same as the lowest layer identifier in the VCL included in theaccess unit”.

Alternatively, the condition CX1 can also be represented as thefollowing condition CX1′.

CX1′: “in the case where the VPS having the layer identifiernuh_layer_id equal to nuhLayerIdA is an active VPS of the layer havingthe layer identifier nuh_layer_id equal to the nuhLayerIdB(nuhLayerIdB>=nuhLayerIdA), the layer identifier nuhLayerIdA is equal tothe lowest layer identifier in the VCL included in the access unit”.

In other words, the bit stream constraint CX1 (CX1′) means that the VPSreferred to by the target layer is the VCL included in the access unitthat is the set of NAL units of the target layer set and belongs to thesame layer as the VCL having the lowest layer identifier.

“The VPS referred to by the target layer is the VCL included in theaccess unit that is the set of NAL units of the target layer set, andbelongs to the same layer as the VCL having the lowest layer identifier”means “in the case where the layer in the layer set B that is the subsetof the layer set A refers to VPS of the layer ‘included in the layer setAbut is not included in the layer set B’ in the layer set A, in thelayer set B extracted through bit stream extraction, the VPS having thesame coding parameter as the aforementioned VPS is included in the layerset B”. The VPS having the same coding parameter as the aforementionedVPS indicates that the VPS identifier and the syntax in another VPS arethe same as the aforementioned VPS except the layer identifier and thetemporal identifier. Consequently, provision of the bit streamconstraint can solve the problem in that the VPS is not included in thelayer set on the bit stream after bit stream extraction. That is, it canprevent occurrence of the layer that cannot be decoded on the bit streamthat is generated through the bit stream extraction process from the bitstream on a certain layer set and only includes a layer set that is asub-set of the layer set.

(Variation Example 1 of Bit Stream Constraint Pertaining to VPS)

The constraint pertaining to the VPS may be “the layer ID of VPS is thelowest layer ID in the layer set, and the temporal ID is zero (tId=0)”.

That is, in consideration of the bit stream conformance, the bit streamis required to satisfy at least the following condition CX2.

CX2: “in the case where the VPS with the layer identifier nuhLayerIdA isan active VPS of the layer with the layer identifier nuhLayerIdB(nuhLayerIdB>=nuhLayerIdA), the layer identifier nuhLayerIdA is thelowest layer identifier in the layer set”.

The condition CX2 may also be represented as the following conditionCX2′.

CX2′: “in the case where the VPS having the layer identifiernuh_layer_id equal to nuhLayerIdA is an active VPS of the layer havingthe layer identifier nuh_layer_id equal to the nuhLayerIdB(nuhLayerIdB>=nuhLayerIdA), the layer identifier nuhLayerIdA is thelowest layer identifier in the layer set”.

In other words, the bit stream constraint CX2 (CX2′) means that the VPSreferred to by the target layer is the VPS having the lowest layeridentifier in the target layer set.

“The VPS referred to by the target layer is the VPS having the lowestlayer identifier in the target layer set” means “in the case where thelayer in the layer set B that is the subset of the layer set A refers toVPS of the layer ‘included in the layer set A but is not included in thelayer set B’ in the layer set A, in the layer set B extracted throughbit stream extraction, the VPS having the same coding parameter as theaforementioned VPS is included in the layer set B”.

Consequently, provision of the bit stream constraint can solve theproblem in that the VPS is not included in the layer set on the bitstream after bit stream extraction. That is, the problem that can occurin the conventional art shown in FIG. 1 can be solved, or it can preventoccurrence of the layer that cannot be decoded on the bit stream that isgenerated through the bit stream extraction process from the bit streamon a certain layer set and only includes the layer set of a sub-set ofthe layer set.

(Sequence Parameter Set)

The sequence parameter set SPS defines a set of coding parametersreferred to by the image decoding apparatus 1 for decoding the targetsequence. For example, the active VPS identifier(sps_video_parameter_set_id) indicating the active VPS referred to bythe target SPS, the SPS identifier (sps_seq_parameter_set_id) used toidentify SPS referred to by the after-mentioned picture parameter setand other syntax elements, and the width and height of a picture aredefined. Multiple SPSs may be in the coded data. In this case, an SPSused for decoding is selected from among the SPSs for each targetsequence. The SPS used to decode a specific sequence belonging to acertain layer is also called an active SPS. SPSs applied to the basiclayer and the extended layer may be discriminated, and the SPS for thebasic layer may be called an active SPS while the SPS for the extendedlayer may be called an active layer SPS. Hereinafter, the SPS means theactive SPS used for decoding the target sequence belonging to a certainlayer, if not otherwise specified. The SPS that has the layerID=nuhLayerIdA and is used to decode the sequence belonging to the layerhaving the layer ID=nuhLayerIdA may be used for decoding the sequencebelonging to the layer (nuhLayerIdB, nuhLayerIdB>nuhLayerIdA) with alayer ID higher than nuhLayerIdA. A constraint that the temporal ID ofSPS is zero (tId=0) (also called bit stream constraint) is hereinafterassumed between the decoder and the encoder, if not otherwise specified.

(Picture Parameter Set)

The picture parameter set PPS defines a set of coding parametersreferred to by the image decoding apparatus 1 for decoding each picturein the target sequence. For example, the set includes the active SPSidentifier (pps_seq_parameter_set_id) indicating the active SPS referredto by the target PPS, the PPS identifier (pps_pic_parameter_set_id) usedto identify PPS referred to by an after-mentioned slice header and othersyntax elements, the reference value (pic_init_qp_minus26) ofquantization width used for picture decoding, a flag(weighted_pred_flag) representing application of weighted prediction,and a scaling list (quantization matrix). Note that multiple PPSs may bepresent. In this case, any of PPSs is selected from each picture in thetarget sequence. The PPS used for decoding to obtain a specific picturebelonging to a certain layer is called an active PPS. PPSs applied tothe basic layer and the extended layer may discriminated, and the PPSfor the basic layer may be called an active PPS while the PPS for theextended layer may be called an active layer PPS. Hereinafter, the PPSmeans the active PPS for the target picture belonging to a certainlayer, if not otherwise specified. The PPS that has the layerID=nuhLayerIdA and is used to decode the picture belonging to the layerhaving the layer ID=nuhLayerIdA may be used for decoding the picturebelonging to the layer (nuhLayerIdB, nuhLayerIdB>nuhLayerIdA) with alayer ID higher than nuhLayerIdA.

The active SPS and the active PPS may be set to different SPS and PPSfor each layer. That is, the decoding process can be performed withreference to different SPS and PPS for each layer.

(Picture Layer)

The picture layer defines a set of data referred to by the hierarchicalvideo decoding apparatus 1 to decode the processing target picture PICT(hereinafter, also called a target picture). As shown in FIG. 9(b), thepicture PICT includes slices S0 to SNS−1 (NS is the total number ofslices included in the picture PICT).

Hereinafter, in the case where the slices S0 to SNS−1 are not requiredto be discriminated from each other, the description may be sometimesmade without the subscripts of codes. This omission is also applicableto other data which is included in after-mentioned hierarchically codeddata DATA and to which a subscript is added.

(Slice Layer)

The slice layer defines a set of data referred to by the hierarchicalvideo decoding apparatus 1 to decode the processing target slice S(hereinafter, also called a target slice). As shown in FIG. 9(c), theslice S includes a slice header SH, and slice data SDATA.

The slice header SH includes a coding parameter group referred to by thehierarchical video decoding apparatus 1 to define the method of decodingthe target slice. For example, an active PPS identifier(slice_pic_parameter_set_id) that designates a PPS (active PPS) referredto decode the target slice is included. The SPS referred to by theactive PPS is designated by the active SPS identifier(pps_seq_parameter_set_id) included in the active PPS. Furthermore, theVPS (active VPS) referred to by the active SPS is designated by theactive VPS identifier (sps_video_parameter_set_id) included in theactive SPS.

For example, with reference to FIG. 10, sharing of the parameter setbetween layers in this embodiment (shared parameter set) is described.FIG. 10 represents the reference relationship between the headerinformation and the coded data constituting the access unit (AU). In theexample of FIG. 10, each of the slices constituting a picture belongingto the layers L # K (K=Nmin . . . Nmax) in each AU includes the activePPS identifier that designates the PPS to be referred to in the sliceheader, and designates the PPS (active PPS) to be used for decoding(also called activates) by means of the identifier when decoding of eachslice is started. In the same picture, the coding parameter of each ofPPS, SPS and VPS to be referred by each slice have to be the same aseach other. In the activated PPS, the active SPS identifier thatidentifies the SPS (active SPS) to be referred to for the decodingprocess is included. The SPS (active SPS) used for decoding isidentified (activated) by means of the identifier. Likewise, in theactivated SPS, the active VPS identifier that identifies the VPS (activeVPS) to be referred to for the process of decoding the sequencebelonging to each layer is included. The VPS (active VPS) used fordecoding is identified (activated) by means of the identifier.

According to the above procedures, the parameter set required to performthe process of decoding the coded data on each layer is established. Inthe example of FIG. 10, it is assumed that the layer identifier of eachof parameter sets (VPS, SPS and PPS) is the lowest layer ID=L # Nminbelonging to a certain layer set. The slice having the layer ID=L # Nminrefers to the parameter set having the same layer ID. That is, in theexample of FIG. 10, the slice with the layer ID=L # Nmin in AU # irefers to the PPS with the layer ID=L # Nmin and the PPS identifier=0.This PPS refers to the SPS with the layer ID=L # Nmin and the SPSidentifier=0. This SPS refers to the VPS with the layer ID=L # Nmin andthe VPS identifier=0. Meanwhile, the slice with the layer ID=L #K(K>Nmin) (L # Nmax in FIG. 10) in the AU # i can refer to the PPS andSPS having the same layer ID (=L # K), while this slice can also refersto the PPS and SPS on the lower layer L # M (K>M) (M=Nmin, L # Nmin inFIG. 10) than L # K. That is, the shared parameter set is referred tobetween layers. Consequently, the need of redundant transmission of theparameter set including the same coding parameter as that on the lowerlayer is negated on the higher layer. Therefore, the amount of codespertaining to the redundant parameter set can be reduced, and the amountof processing pertaining to decoding/coding can be reduced.

The identifier of the higher parameter set to be referred to by eachpiece of the header information (slice header, PPS and SPS) is notlimited to the example of FIG. 10. In the case of VPS, the identifiermay be selected from among the VPS identifiers k=0 . . . 15. In the caseof SPS, the identifier may be selected from among the SPS identifiersm=0 . . . 15. In the case of PPS, the identifier may be selected fromamong the PPS identifiers n=0 . . . 63.

The slice type designation information (slice_type) that designates theslice type is an example of the coding parameter included in the sliceheader SH.

Slice types that can be designated by the slice type designationinformation include (1) I slice that only uses intra prediction duringcoding, (2) P slice that uses mono-directional prediction or the intraprediction during coding, and (3) B slice that uses the mono-directionalprediction, bi-directional prediction, or intra prediction duringcoding.

(Slice DATA Layer)

The slice data layer defines a set of data referred to by thehierarchical video decoding apparatus 1 to decode the processing targetslice data SDATA. As shown in FIG. 9(d), the slice data SDATA includes acoded tree block (CTB: Coded Tree Block). The CTB is a fixed-size block(e.g., 64×64) in s slice, and also called the largest cording unit(LCU).

(Coding Tree Layer)

As shown in FIG. 9(e), the coding tree layer defines the set of datareferred to by the hierarchical video decoding apparatus 1 to decode theprocessing target coded tree block. The coded tree unit is splitaccording to recursive quadtree splitting. The nodes of a tree structureobtained by the recursive quadtree splitting is called coding tree. Aquadtree intermediate node is a coded tree unit (CTU). The coded treeblock itself is defined as the highest CTU. The CTU includes a splitflag (split_flag). When the split_flag is one, it is split into fourcoded tree units CTU. When the split_flag is zero, the coded tree unitCTU is split into four coded units (CUs). The coded unit CU is an endnode of the coding tree layer, and this layer is not split further. Thecoded unit CU serves as a basic unit of the coding process.

The size of the coded tree unit CTU and the possible sizes of respectivecoded units depend on size designation information on the minimum codednode included in the sequence parameter set SPS, and the difference inhierarchical depth between the maximum coded node and the minimum codednode. For example, in the case where the size of the minimum coded nodeis 8×8 pixels and the difference in hierarchical depth between themaximum coded node and the minimum coded node is three, the size of thecoded tree unit CTU is 64×64 pixels and the size of the coded node maybe any of four sizes, that is, 64×64 pixels, 32×32 pixels, 16×16 pixels,and 8×8 pixels.

A partial region on a target picture decoded from the coded tree unit iscalled a coding tree block (CTB). A CTB corresponding to a luma picture,which is the luma component of a target picture, is called luma CTB. Inother words, the partial region that is on the luma picture and decodedfrom the CTU is called luma CTB. Meanwhile, a partial regioncorresponding to a color-difference picture and decoded from the CTU iscalled a color-difference CTB. Typically, in the case where the colorformat of an image is determined, the luma CTB size and thecolor-difference CTB size can be transformed into each other. Forexample, in the case where the color format is 4:2:2, thecolor-difference CTB size is half of the luma CTB size. In the followingdescription, the CTB size means the luma CTB size, if not otherwisespecified. The CTU size is the luma CTB size corresponding to CTU.

(Coding Unit Layer)

As shown in FIG. 9(f), the coding unit layer defines the set of datareferred to by the hierarchical video decoding apparatus 1 to decode theprocessing target coding unit. More specifically, the coding unit CUincludes a CU header CUH, a predictive tree, and a transform tree. TheCU header CUH defines whether the coded unit is a unit using intraprediction or a unit using inter prediction and the like. The coded unitis along a route of a prediction tree (PT) and a transform tree (TT). Aregion that is in a picture and corresponds to CU is called a codingblock (CB). The CB on the luma picture is called luma CB, and the CB ona color-difference picture is called a color-difference CB. The CU size(size of coded node) means the luma CB size.

(Transform Tree)

The transform tree (hereinafter, abbreviated as TT) has the coded unitCU split into one or more transform blocks, and defines the position andsize of each transform block. In other words, the transform block is oneor more regions that do not overlap with each other and constitute thecoded unit CU. The transform tree includes one or more transform blocksobtained through the aforementioned splitting. The informationpertaining to the transform tree included in CU and information includedin the transform tree are called TT information.

Splitting in the transform tree is classified into splitting thatassigns a region having the same size as the coded unit as a transformblock, and splitting according to recursive quadtree splitting as withthe aforementioned tree block splitting. The transform process isperformed for each transform block. Hereinafter, the transform block,which is a unit of transform, is also called a transform unit (TU).

The transform tree TT includes TT splitting information SP_TT thatdesignates a pattern of splitting the target CU into each transformblock, and quantization predictive residue QD₁ to QD_(NT) (NT is thetotal number of transform units TU included in the target CU).

More specifically, the TT splitting information SP_TT is information fordetermining the shape of each transform block included in the target CU,and the position in the target CU. For example, the TT splittinginformation SP_TT can be achieved using information(split_transform_unit_flag) indicating whether to split the target nodeor not, and information indicating the depth of splitting(transfoDepth). For example, in the case where the CU size is 64×64,each transform block obtained by splitting can be a size ranging from32×32 pixels to 4×4 pixels.

Each quantization predictive residue QD is coded data generated by thehierarchical video coding apparatus 2 applying the following processes 1to 3 to the target block, which is a transform block to be processed.

Process 1: Apply frequency transform (e.g., DCT transform (DiscreteCosine Transform) and DST transform (Discrete Sine Transform), etc.) tothe predictive residue obtained by subtracting the predictive image fromthe coding target image.

Process 2: Quantize the transform coefficients obtained in the process1.

Process 3: Variable-length code the transform coefficients quantized inthe process 2.

The aforementioned quantization parameter qp represents the magnitude ofthe quantization step QP used by the hierarchical video coding apparatus2 to quantize the transform coefficients (QP=2^(qp/6)).

(Predictive Tree)

The predictive tree (hereinafter, abbreviated as PT) has the coded unitCU split into one or more predictive blocks, and defines the positionand size of each predictive block. In other words, the predictive blockis one or more regions that do not overlap with each other andconstitute the coded unit CU. The predictive tree includes one or morepredictive blocks obtained through the aforementioned splitting. Theinformation pertaining to the predictive tree included in CU andinformation included in the predictive tree are called PT information.

The predicting process is performed for each predictive block.Hereinafter, the predictive block, which is the unit of prediction, isalso called prediction unit (PU).

The splitting types of the predictive tree are roughly classified intotwo cases, i.e., a case of intra prediction and a case of interprediction. The intra prediction is prediction in the same picture. Theinter prediction means a predicting process performed between picturesdifferent from each other (e.g., between displayed times, and betweenlayer images). That is, the inter prediction adopts, as the referencepicture, any of the reference picture on the same layer as the targetlayer (intra-layer reference picture) and the reference picture on thereference layer of the target layer (inter-layer reference picture), andgenerates the predictive image from the decoded image on the referencepicture.

In the case of intra prediction, the splitting methods are 2N×2N (thesame size as that of the coded unit) and N×N.

In the case of inter prediction, the splitting methods perform codingaccording to part_mode of the coded data, and these methods are 2N×2N(the same size as that of the coded unit), 2N×N, 2N×nU, 2N×nD, N×2N,nL×2N, nR×2N, N×N, etc. Note that N=2^(m) (m is any integer at leastone). The number of splitting is any of 1, 2 and 4. Consequently, thenumber of PUs included in CU ranges from one to four. These PUs aresequentially represented as PU0, PU1, PU2 and PU3.

(Prediction Parameter)

The predictive image of the predictive unit is derived according to theprediction parameters accompanying the predictive unit. The predictionparameters include prediction parameters for intra prediction, andprediction parameters for inter prediction.

The intra prediction parameters are parameters for decoding intraprediction (prediction mode) for each intra PU. The parameters fordecoding the prediction mode includes mpm_flag that is a flag pertainingto MPM (Most Probable Mode; the same applies hereafter), mpm_idx that isan index for selecting MPM, and rem_idx that is an index for designatingthe prediction mode other than MPM. Here, the MPM is an estimatedprediction mode having a high possibility to be selected in the targetpartition. For example, the estimated prediction mode estimated on thebasis of the prediction mode assigned to a partition around the targetpartition, DC mode typically having a high occurrence probability, andPlanar mode may be included in the MPM. Hereinafter, in the case whereit is simply represented as “prediction mode”, this representationindicates the luma prediction mode if not otherwise specified. Thecolor-difference prediction mode is represented as “color-differenceprediction mode” to discriminate this mode from the luma predictionmode. The parameters for decoding the prediction mode include chromamode that is a parameter for designating the color-difference predictionmode.

The inter prediction parameter includes a prediction list usage flagspredFlagL0 and predFlagL1, reference picture indices refIdxL0 andrefIdxL1, and vectors mvL0 and mvL1. Each of the prediction list usageflags predFlagL0 and predFlagL1 is a flag indicating whether referencepicture lists called L0 reference list and L1 reference list are used ornot. In the case where the value is one, the corresponding referencepicture list is used. The case where the two reference picture lists areused, i.e., the case where predFlagL0=1 and predFlagL1=1 corresponds tothe bidirectional prediction. The case where one reference picture isused, i.e., the case of (predF LagL0, predFlagL1)=(1, 0) or (predFlagL0,predFlagL1)=(0, 1) corresponds to single prediction.

The syntax elements for deriving the inter prediction parametersincluded in the coded data include, for example, the splitting modepart_mode, merge flag merge_flag, merge index merge_idx, interprediction identifier inter_pred_idc, reference picture index refIdxLX,prediction vector index mvp_LX_idx, and difference vector mvdLX. Eachvalue of the prediction list usage flag is derived on the basis of theinter prediction identifier as follows.

predFlagL0=inter prediction identifier & 1

predFlagL1=inter prediction identifier>>1

Here, “&” is a logical multiplication, “>>” is right shift.

(Examples of Reference Picture List)

An example of the reference picture list is now described. The referencepicture list is a sequence made up of reference pictures stored in adecoded picture buffer. FIG. 11(a) is a diagram schematically showingexamples of reference picture lists. In the reference picture list RPL0,five rectangles laterally arranged in a row indicate respectivereference pictures. Symbols P1, P2, Q0, P3 and P4 shown from the leftend toward the right are symbols for indicating the respective referencepictures. Likewise, in the reference picture list RPL1, symbols P4, P3,R0, P2 and P1 laterally arranged from the left end toward the right aresymbols for indicating the respective reference pictures. “P” of P1 andthe like indicates the target layer P. “Q” of Q0 indicates a layer Qdifferent from the target layer P. Likewise, “R” of R0 indicates a layerR different from the target layer P and the layer Q. Subscripts attachedto P, Q and R indicate the picture order number POC. A downward arrowimmediate below of refIdxL0 shows that the reference picture indexrefIdxL0 is an index that refers to the reference picture Q0 from thereference picture list RPL0 in the decoded picture buffer. Likewise, adownward arrow immediate below of refIdxL1 shows that the referencepicture index refIdxL1 is an index that refers to the reference pictureP3 from the reference picture list RPL1 in the decoded picture buffer.

(Examples of Reference Picture)

Examples of reference pictures used for deriving vectors are nowdescribed. FIG. 11(b) is a diagram schematically showing examples ofreference pictures. In FIG. 11(b), the abscissa axis indicates thedisplay time, and the ordinate axis indicates the number of layers.Illustrated rectangles arranged in vertical three rows and horizontalthree columns (total nine) indicate respective pictures. Among the ninerectangles, the rectangle on the second column from the left on the lowrow indicates the decoding target picture (target picture), and theresidual eight rectangles show the respective reference pictures. Thereference pictures Q2 and R2 indicated by the downward arrow from thetarget picture are pictures at the same display time as that of thetarget picture but on different layers. In inter-layer prediction withreference to the target picture curPic(P2), the reference picture Q2 orR2 is used. The reference picture P1 indicated by the left arrow fromthe target picture is a past picture on the same layer as the targetpicture. The reference picture P3 indicated by the right arrow from thetarget picture is a future picture on the same layer as the targetpicture. Motion prediction with reference to the target picture uses thereference picture P1 or P3.

(Merge Prediction and AMVP Prediction)

The method of decoding (coding) the inter prediction parameters includesa merge prediction (merge) mode, and an AMVP (Adaptive Motion VectorPrediction) mode. A merge flag merge flag is a flag for identifyingthese modes. In each of the merge prediction mode and the AMVP mode, theprediction parameter of the target PU is derived using the predictionparameter of the block having already been processed. The mergeprediction mode is a mode that uses the prediction parameter havingalready derived as it is without including the prediction list usageflag predFlagLX (inter prediction identifier inter_pred_idc), referencepicture index refIdxLX, and vector mvLX in the coded data. The AMVP modeis a mode that includes the inter prediction identifier inter_pred_idc,reference picture index refIdxLX, and vector mvLX in the coded data. Thevector mvLX is coded as the prediction vector index mvp_LX_idxindicating the prediction vector, and the difference vector (mvdLX).

The inter prediction identifier inter_pred_idc is data indicating thetypes and the number of reference pictures, and has any of valuesPred_L0, Pred_L1 and Pred_Bi. Pred_L0 and Pred_L1 indicate use of areference picture stored in the reference picture list, which is one oflists called the L0 reference list and L1 reference list, respectively.Both cases indicate use of a single reference picture (singleprediction). Predictions using the L0 reference list and the L1reference list are called L0 prediction and L1 prediction, respectively.Pred_Bi indicates use of two reference pictures (bi-prediction), and useof two reference pictures stored in the L0 reference list and the L1reference list. The prediction vector index mvp_Lx_idx is an index thatindicates the prediction vector, and the reference picture indexrefIdxLX is an index that indicates the reference picture stored in thereference picture list. LX is a description used in the case withoutdiscrimination between the L0 prediction and the L1 prediction.Replacement of LX with any of L0 and L1 discriminates the parameter forthe L0 reference list and the parameter for the L1 reference list fromeach other. For example, refIdxL0 is a representation of the referencepicture index used for the L0 prediction, refIdxL1 is a representationof the reference picture index used for the L1 prediction, and refIdx(refIdxLX) is a representation of the reference picture index used inthe case without discrimination between refIdxL0 and refIdxL1.

The merge index merge_idx is an index indicating what predictionparameter is used as the prediction parameter of the decoding targetblock among the prediction parameter candidates (merge candidates)derived from blocks having already been processed.

(Motion Vector and Displacement Vector)

Vectors mvLX include a motion vector and a displacement vector(disparity vector). The motion vector is a vector indicating thepositional deviation between the position of a block in a picture at acertain display time on a certain layer, and the position of thecorresponding block on the picture on the same layer at a differentdisplay time (e.g., adjacent discrete time). The displacement vector isa vector indicating the positional deviation between the position of ablock in a picture at a certain display time on a certain layer, and theposition of the corresponding block on the picture on the differentlayer at the same display time. The picture on a different layerencompasses the case of a picture at the same resolution but a differentquality, the case of a picture in a different viewpoint, and the case ofa picture at a different resolution. In particular, a displacementvector corresponding to a picture in a different viewpoint is called adisparity vector. In the following description, in the case where themotion vector and the displacement vector are not discriminated fromeach other, the vector is simply called the vector mvLX. The predictionvector and the difference vector pertaining to vector mvLX are calledthe prediction vector mvpLX and the difference vector mvdLX,respectively. It is discriminated whether the vector mvLX and thedifference vector mvdLX are motion vectors or displacement vectors,using the reference picture index refIdxLX accompanying the vector.

Each of the parameters having been described above may be separatelycoded. Alternatively, the parameters may be integrally coded in acomplex manner. In the case of integrally coding the parameters in acomplex manner, an index is assigned to a combination of parametervalues. The assigned index is coded. If a parameter can be derived fromanother parameter or decoded information, coding of the parameterconcerned can be omitted.

[Hierarchical Video Decoding Apparatus]

The configuration of the hierarchical video decoding apparatus 1according to this embodiment is hereinafter described with reference toFIGS. 19 to 21.

(Configuration of Hierarchical Video Decoding Apparatus)

The configuration of the hierarchical video decoding apparatus 1according to this embodiment is described. FIG. 19 is a diagramschematically showing the configuration of the hierarchical videodecoding apparatus 1 according to this embodiment. The hierarchicalvideo decoding apparatus 1 decodes the hierarchically coded data DATAsupplied from the hierarchical video coding apparatus 2 on the basis ofthe layer ID list LayerIdListTarget of the target layer setLayerSetTarget, which is the decoding target and is included in thehierarchically coded data DATA supplied from the outside, and of targethighest temporal identifier HighestTidTarget that designates the highestsub-layer accompanying the decoding target layer, and generates thedecoded image POUT # T on each layer included in the target layer set.That is, the hierarchical video decoding apparatus 1 decodes the codeddata on pictures on layers in an ascending order from the lowest layerID to the highest layer ID included in the target layer set, andgenerates the decoded images (decoded pictures). In other words, thecoded data of the pictures on each layer is decoded in the order of thelayer ID list LayerIdListTarget[0] . . . LayerIdListTarget[N−1] (N isthe number of layers included in the target layer set) of the targetlayer set.

As shown in FIG. 19, the hierarchical video decoding apparatus 1includes an NAL demultiplexing unit 11, and a target layer set picturedecoding unit 10. Furthermore, the target layer set picture decodingunit 10 includes a parameter set decoding unit 12, a parameter setmanagement unit 13, a picture decoding unit 14, and a decoded picturemanagement unit 15. The NAL demultiplexing unit 11 further includes abit stream extraction unit 17.

The hierarchically coded data DATA includes not only an NAL generated byVCL, but also an NAL that includes parameter sets (VPS, SPS and PPS),SEI and the like. These NALs are called non-VCL NALs (non-VCLs), whichare discriminated from VCL NALs.

In a schematic view, the bit stream extraction unit 17 included in theNAL demultiplexing unit 11 performs a bit stream extraction process onthe basis of the layer ID list LayerIdListTarget on the layerconstituting the target layer set LayerSetTarget supplied from theoutside, and of the target highest temporal identifier HighestTidTarget,and extracts, from the hierarchically coded data DATA, the targethighest temporal identifier HighestTidTarget and the target layer setcoded data DATA # T (BitstreamToDecode) made up of NAL units included ina set (called a target set TargetSet) defined by the layer ID listLayerIdListTarget of the target layer set LayerSetTarget. The details ofprocesses having high relationship with the present disclosure in thebit stream extraction unit 17 are described later.

Subsequently, the NAL demultiplexing unit 11 demultiplexes the targetlayer set coded data DATA # T (BitstreamToDecode) extracted from the bitstream extraction unit 17, refers to the NAL unit type included in theNAL unit, the layer identifier (layer ID), and the temporal identifier(temporal ID), and supplies the NAL unit included in the target layerset to the target layer set picture decoding unit 10.

The target layer set picture decoding unit 10 supplies the non-VCL NALto the parameter set decoding unit 12, and supplies the VCL NAL to thepicture decoding unit 14, among the NALs included in the supplied targetlayer set coded data DATA # T. That is, the target layer set picturedecoding unit 10 decodes the header (NAL unit header) of the suppliedNAL unit, and supplies the non-VCL coded data to the parameter setdecoding unit 12 and supplies the VCL coded data to the picture decodingunit 14, together with the decoded NAL unit type, the layer identifier,and the temporal identifier, on the basis of the NAL unit type, thelayer identifier and the temporal identifier, which are included in thedecoded NAL unit header.

The parameter set decoding unit 12 decodes the input non-VCL NAL toobtain the parameter set, i.e., the VPS, the SPS and the PPS, andsupplies the sets to the parameter set management unit 13. The detailsof processes having high relationship with the present disclosure in theparameter set decoding unit 12 are described later.

The parameter set management unit 13 holds the decoded parameter setsfor each of the identifiers of the parameter sets, and thus holds thecoding parameters of the parameter sets. More specifically, in the caseof the VPS, the VPS coding parameter is held for each VPS identifier(video_parameter_set_id). In the case of the SPS, the SPS codingparameter is held for each SPS identifier (sps_seq_parameter_set_id). Inthe case of the PPS, the PPS coding parameter is held for each PPSidentifier (pps_pic_parameter_set_id).

The parameter set management unit 13 supplies the after-mentionedpicture decoding unit 14 with the coding parameter of the parameter set(active parameter set) referred to by the picture decoding unit 14 forpicture decoding. More specifically, first, the active PPS is designatedby means of the active PPS identifier (slice_pic_parameter_set_id)included in the slice header SH decoded by the picture decoding unit 14.Next, the active SPS is designated by means of the active SPS identifier(pps_seq_parameter_set_id) included in the designated active PPS.Finally, the active VPS is designated by means of the active VPSidentifier (sps_video_parameter_set_id) included in the active SPS.Subsequently, the coding parameters of the designated active PPS, activeSPS, and active VPS are supplied to the picture decoding unit 14. Thedesignation of the parameter set referred to for picture decoding isalso called “activation of the parameter set”. For example, designationsof the active PPS, active SPS, and active VPS are called “activation ofPPS”, “activation of SPS” and “activation of VPS”, respectively.

The picture decoding unit 14 generates the decoded picture on the basisof the input VCL NAL, the active parameter sets (active PPS, active SPSand active VPS) and the reference picture, and supplies the picture tothe decoded picture management unit 15. The supplied decoded picture isrecorded in a buffer in the decoded picture management unit 15. Thedetails of the picture decoding unit 14 are described later.

The decoded picture management unit 15 records the input decoded picturein the internal decoded picture buffer (DPB), while generating thereference picture list and determining the output picture. The decodedpicture management unit 15 outputs the decoded picture recorded in theDPB, as an output picture POUT # T, at a predetermined timing, to theoutside.

(Bit Stream Extraction Unit 17)

The bit stream extraction unit 17 performs the bit stream extractionprocess on the basis of the layer ID list LayerIdListTarget on the layerconstituting the target layer set LayerSetTarget supplied from theoutside, and of the target highest temporal identifier HighestTidTarget,and removes (discards), from the input hierarchically coded data DATA,the target highest temporal identifier HighestTidTarget, and NAL unitsthat are not included in a set (called a target set TargetSet) definedby the layer ID list LayerIdListTarget of the target layer setLayerSetTarget, and thus extracts and outputs the target layer set codeddata DATA # T (BitstreamToDecode) made up of NAL units included in thetarget set TargetSet.

Hereinafter, referring to FIG. 27, the schematic operation of the bitstream extraction unit 17 according to this embodiment is described.FIG. 27 is a flowchart showing a bit stream extraction process in unitsof the access unit in the bit stream extraction unit 17.

(SG101) The bit stream extraction unit 17 decodes the NAL unit header ofthe supplied target NAL unit according to the syntax table shown in FIG.6(b). That is, an NAL unit type (nal_unit_type), a layer identifier(nuh_layer_id) and a temporal identifier (nuh_temporal_id_plus1) aredecoded. The layer identifier nuhLayerId of the target NAL unit is setin the “nuh_layer_id”. The temporal identifier temporalId of the targetNAL unit is set in the “nuh_temporal_id_plus1−1”.

(SG102) It is determined whether the NAL unit type (nal_unit_type) ofthe target NAL unit is a parameter set or not on the basis of the“nal_unit_type” and “Name of nal_unit_type” shown in FIG. 7. Here, tosatisfy the conformance condition CX1, it is determined whether the NALunit type is “VPS_NUT” (nal_unit_type==32) or not. That is, in the casewhere the NAL unit type is the video parameter set (YES in SG102), theprocessing transitions to step SG103. In the other case (No in SG102),the processing transitions to step SG105.

(SG103) It is determined whether the layer identifier of the target NALunit is included in the target set or not. More specifically, it isdetermined whether or not a value identical to the layer identifier ofthe target NAL unit is in the layer ID list LayerIdListTarget of thelayer constituting the target layer set LayerSetTarget. In the casewhere the layer identifier of the target NAL unit is inLayerIdListTarget (YES in SG103), the processing transitions to stepSG105. In the other case (No in SG103), that is, in the case where thelayer identifier of the target NAL unit is not in LayerIdListTarget, theprocessing transitions to step SG104.

(SG104) The layer identifier of the target NAL unit is updated to thelowest layer identifier in the LayerIdListTarget. That is, theidentifier is updated to “nuh_layer_id=LayerIdListTarget[0]”.

(SG105) It is determined whether or not the layer identifier of thetarget NAL unit and the temporal identifier are included in the targetset TargetSet, on the basis of the layer ID list LayerIdListTarget ofthe layer constituting the target layer set LayerSetTarget, and of thetarget highest temporal identifier. More specifically, it is determinedwhether the following conditions (1) to (2) are satisfied or not. In thecase where at least any of the conditions is satisfied (true) (YES inSG105), the processing transitions to step SG106. In the other case (Noin SG105), the processing transitions to step SG107.

(1) In the case “a value identical to the layer identifier of the targetNAL unit is in the layer ID list LayerIdListTarget of the layerconstituting the target layer set LayerSetTarget”, it is determined tobe true. In the other case (the value identical to the layer identifierof the target NAL unit is not in the layer ID list LayerIdListTarget ofthe layer constituting the target layer set LayerSetTarget), it isdetermined to be false.

(2) In the case “the temporal identifier of the target NAL unit is equalto or less than the target highest temporal identifierHighestTidTarget”, it is determined to be true. In the other case (thetemporal identifier of the target NAL unit is greater than the targethighest temporal identifier HighestTidTarget), it is determined to befalse.

(SG106) The target NAL unit is discarded. That is, as the target NALunit is not included in the target set TargetSet, the bit streamextraction unit 17 removes the target NAL unit from the inputhierarchically coded data DATA.

(SG107) It is determined whether any unprocessed NAL unit is in the sameaccess unit or not. In the case where any unprocessed NAL unit ispresent (YES in SG107), the processing transitions to step SG101 tocontinue the bit stream extraction in units of the NAL unit constitutingthe target access unit. In the other case (No in SG107), the processingtransitions to step SG10A.

(SG10A) It is determined whether the access unit next to the targetaccess unit is in the input hierarchically coded data DATA or not. Inthe case where the next access unit is present (YES in SG10A), theprocessing transitions to step SG101 to continue the process for thenext access unit. In the case where the next access unit is absent (Noin SG10A), the bit stream extraction process is finished.

The operation of the bit stream extraction unit 17 according toEmbodiment 1 has thus been described above. The steps are not limited tothe above steps. Alternatively, the steps may be changed in animplementable range.

The bit stream extraction unit 17, which has been described above,performs the bit stream extraction process on the basis of the layer IDlist LayerIdListTarget on the layer constituting the target layer setLayerSetTarget supplied from the outside, and of the target highesttemporal identifier HighestTidTarget, and removes (discards), from theinput hierarchically coded data DATA, the target highest temporalidentifier HighestTidTarget, and NAL units that are not included in aset (called a target set TargetSet) defined by the layer ID listLayerIdListTarget of the target layer set LayerSetTarget, and thusextracts and outputs the target layer set coded data DATA # T(BitstreamToDecode) made up of NAL units included in the target layerset TargetSet. Furthermore, the bit stream extraction unit 17 ischaracterized in that, in the case where the layer identifier of thevideo parameter set is not included in the target set TargetSet, thisextractor updates (rewrites) the layer identifier of the video parameterset to the lowest layer identifier in the target set TargetSet.

The operation of the bit stream extraction unit 17 assumes “the AUconstituting the input hierarchically coded data DATA includes one (atthe maximum) VPS having the lowest layer identifier in the AU”. However,the operation is not limited thereto. For example, a VPS having a layeridentifier other than the lowest layer identifier in the AU may beincluded in the AU. In this case, the bit stream extraction unit 17 mayregard the VPS of the layer identifier to be updated as what is a layeridentifier that is not included in the target set TargetSet and is thelowest layer identifier, in step SG104. Typically, since the VPS havingthe layer identifier “nuhLayerId=0” is the VPS having the lowest layeridentifier, this VPS is regarded as that to be updated, and the otherVPSs that are not included in the target set TargetSet are discarded.

Consequently, the bit stream extraction unit 17 according to thisembodiment described above can prevent the problem in that the VPS isnot included in the layer set in the bit stream after bit streamextraction. That is, it can prevent occurrence of the layer that cannotbe decoded on the bit stream that is generated through the bit streamextraction process from the bit stream on a certain layer set and onlyincludes the layer set of a sub-set of the layer set.

(Parameter Set Decoding Unit 12)

The parameter set decoding unit 12 decodes the input target layer setcoded data to obtain the parameter sets (VPS, SPS and PPS) to be used todecode the target layer set. The coding parameters of the decodedparameter sets are supplied to the parameter set management unit 13, andrecorded with respect to the identifier of each parameter set.

Typically, the parameter set is decoded on the basis of a predeterminedsyntax table. That is, a bit sequence is read from the coded dataaccording to the procedures defined in the syntax table, and decoded toobtain the syntax value of the syntax included in the syntax table. Avariable may be derived on the basis of the decoded syntax value andincluded into the parameter set to be output, if necessary.Consequently, the parameter set output from the parameter set decodingunit 12 can be represented as the syntax value of the syntax pertainingto the parameter sets (VPS, SPS and PPS) included in the coded data, anda set of variables derived from the syntax value.

Hereinafter, the syntax table having high relationship with the presentdisclosure among syntax tables used for decoding in the parameter setdecoding unit 12 is mainly described.

(Video Parameter Set VPS)

The video parameter set VPS is a parameter set for defining theparameters common to multiple layers, and includes the VPS identifierfor identifying each VPS, and information on the maximum number oflayers, layer set information, and inter-layer dependence information,as layer information.

The VPS identifier is an identifier for identifying each VPS, andincluded in the VPS as the syntax “video_parameter_set_id” (SYNVPS01 inFIG. 12). The VPS identified by the active VPS identifier(sps_video_parameter_set_id) included in the after-mentioned SPS isreferred to during a process of decoding the coded data on the targetlayer in the target layer set.

The information on the maximum number of layers is information thatindicates the maximum number of layers in the hierarchically coded data,and is included as the syntax “vps_max_layers_minus1” (SYNVPS02 in FIG.12)” in the VPS. The maximum number of layers in the hierarchicallycoded data (hereinafter, the maximum number of layers MaxNumLayers) isset to a value of (vps_max_layers_minus1+1). The maximum number oflayers defined here is the maximum number of layers pertaining not totemporal scalability but to other scalabilities (SNR scalability,spatial scalability, view scalability, etc.).

The information on the maximum number of sub-layers is information thatindicates the maximum number of sub-layers in the hierarchically codeddata, and is included as the syntax “vps_max_sub_layers_minus1”(SYNVPS03 in FIG. 12)” in the VPS. The maximum number of sub-layers inthe hierarchically coded data (hereinafter, the maximum number ofsub-layers MaxNumSubLayers) is set to a value of(vps_max_sub_layers_minus1+1). The maximum number of sub-layers definedhere is the maximum number of layers pertaining to the temporalscalability.

The maximum layer identifier information is information that indicatesthe layer identifier (layer ID) of the highest layer included in thehierarchically coded data, and is included as the syntax“vps_max_layer_id” (SYNVPS04 in FIG. 12) in the VPS. In other words, theinformation is the maximum value of the layer ID (nuh_layer_id) of theNAL unit included in the hierarchically coded data.

The information on the number of layer sets is information thatindicates the total number of layer sets in the hierarchically codeddata, and is included as the syntax “vps_num_layer_sets_minus1”(SYNVPS05 in FIG. 12)” in the VPS. The number of layer sets in thehierarchically coded data (hereinafter, the number of layer setsNumLayerSets) is set to a value of (vps_num_layer_sets_minus1+1).

The layer set information is a list that represents a set of layersconstituting a layer set included in the hierarchically coded data(hereinafter, layer ID list LayerSetLayerIdList), and is decoded fromthe VPS. The VPS includes the syntax indicating whether the j-th layer(layer identifier numLayerIdJ) is included in the i-th layer set or not“layer_id_included_flag[i][j]” (SYNVPS06 in FIG. 12). The layer set ismade up of layers having a layer identifier with a syntax value of one.That is, the layer j constituting the layer set i is included in thelayer ID list LayerSetLayerIdList[i].

A VPS extension data presence/absence flag “vps_extension_flag”(SYNVPS07 in FIG. 12) is a flag representing whether the VPS furtherincludes the VPS extension data vps_extension( ) or not (SYNVPS08 inFIG. 12). In this Description, in the case of describing “flagindicating whether to be XX or not” or “XX presence/absence flag”, oneis regarded as the case of presence of XX, and zero is regarded as thecase of absence of XX. In logical negation, logical multiplication orthe like, one is treated as true, and zero is treated as false (the sameapplies hereafter). Note that an actual apparatus and method may useanother value as the true value or the false value.

The inter-layer dependence information is decoded from the VPS extensiondata (vps_extension( )) included in the VPS. The inter-layer dependenceinformation included in the VPS extension data is described withreference to FIG. 13. FIG. 13 shows a part that is of the syntax tablereferred to during VPS extension decoding and pertains to theinter-layer dependence information.

The VPS extension data (vps_extension( )) includes direct dependencyflag “direct_dependency_flag[i][j]” (SYNVPS0A in FIG. 13) as theinter-layer dependence information. The direct dependency flagdirect_dependency_flag[i][j] represents whether the i-th layer directlydepends on the j-th layer or not. In the case of direct dependence, theflag has a value of one. In the case without direct dependence, the flaghas a value of zero. Here, the case where the i-th layer directlydepends on the j-th layer and the case where the i-th layer is regardedas the target layer to be subjected to the decoding process means thatthere is a possibility that the parameter set pertaining to the j-thlayer, the decoded picture, and the related decoded syntax are directlyreferred to by the target layer. On the contrary, the case where thei-th layer does not directly depend on the j-th layer and the case wherethe i-th layer is regarded as the target layer to be subjected to thedecoding process means that there is a possibility that the parameterset pertaining to the j-th layer, the decoded picture, and the relateddecoded syntax are not directly referred to. In other words, when thedirect dependency flag of the i-th layer to the j-th layer is one, thej-th layer can be a direct reference layer of the i-th layer. A set oflayers that can be a direct reference layer to a certain layer, that is,a set of layers with the corresponding direct dependency flag having avalue of one, is called a direct dependency layer set. The layer withi=0, i.e., the 0-th layer (basic layer) has no direct dependencyrelationship with the j-th layer (extended layer). Consequently, thedirect dependency flag “direct_depedency_flag[i][j]” has a value ofzero. As indicated such that the i-loop including SYNVPS0A in FIG. 13starts at one, decoding/coding of the direct dependency flag of the j-thlayer (extended layer) to the 0-th layer (basic layer) may be omitted.

Here, the reference layer ID list RefLayerId[iNuhLId][ ] that indicatesthe direct reference layer set to the i-th layer (layer identifieriNuhLId=numLayerIdI), and the direct reference layer IDX listDirectRefLayerIdx[iNuhLId][ ] that represents the sequential number ofthe element in an ascending order of the j-th layer that is thereference layer of the i-th layer in the direct reference layer set arederived by the following expression. The reference layer ID listRefLayerId[ ][ ] is a two-dimensional array. In the first array element,the layer identifier of the target layer (layer i) is stored. In thesecond array element, the layer identifier of the k-th reference layerin an ascending order in the direct reference layer set is stored. Thedirect reference layer IDX list DirectRefLayerIdx[ ][ ] is atwo-dimensional array. In the first array element, the layer identifierof the target layer (layer i) is stored. In the second array element,the index (direct reference layer IDX) that represents the sequentialnumber of the element in an ascending order of the layer identifier inthe direct reference layer set is stored.

The reference layer ID list, and the direct reference layer IDX list arederived according to the following pseudocode. The layer identifiernuhLayerId of the i-th layer is represented according to the syntax“layer_id_in_nuh[i]” (not shown in FIG. 13) on the VPS. Hereinafter, toshorten the representation of the layer identifier of the i-th layer“layer_id_in_nuh[i]”, it is represented as “nuhLId # i”. In the case oflayer_id_in_nuh[j], “nuhLId # j”. The array NumDirectRefLayers [ ]represents the number of direct reference layers referred to by thelayer having the layer identifier iNuhLId.

(Derivation of Reference Layer ID List and Direct Reference Layer IDXList)

The reference layer ID list, and the direct reference layer IDX list arederived according to the following pseudocode.

  for(i=0; i< vps_max_layers_minus1+1; i++){ iNuhLId = nuhLId#i;NumDirectRefLayers[iNuhLId] = 0; for(j=0; j<i; j++){ if(direct_dependency_flag[i][j]){RefLayerId[iNuhLId][NumDirectRefLayers[iNuhLId]] = nuhLId#j;NumDirectRefLayers[iNuhLId]++; DirectRefLayerIdx[iNuhLId][nuhLId#j]=NumDirectRefLayers[iNuhLId] − 1; } } // end of loop on for(j=0; j<i;i++) } // end of loop on for(i=0; i< vps_max_layers_minus1+1; i++)

The pseudocode can be represented in steps as follows.

(SL01) The start point of a loop pertaining to derivation of thereference layer ID list pertaining to the i-th layer, and the directreference layer IDX list. Before the loop is started, the variable i isinitialized to zero. The process in the loop is executed when thevariable i is less than the number of layers “vps_max_layers_minus1+1”.The variable i is incremented by “1” every time the process in the loopis executed one time.

(SL02) The layer identifier nuhLID # i of the i-th layer is set in thevariable iNuhLid. Furthermore, the number of direct reference layersNumDirectRefLayers[iNuhLID] of the layer identifier nuhLID # i is set tozero. (SL03) The start point of a loop pertaining to addition ofelements on the j-th layer to the reference layer ID list pertaining tothe i-th layer and the direct reference layer IDX list. Before the loopis started, the variable j is initialized to zero. The process in theloop is executed when the variable j (j-th layer) is less than that ofthe i-th layer (j<i). The variable j is incremented by “1” every timethe process in the loop is executed one time.

(SL04) The direct dependency flag (direct_dependency_flag[i][j]) of thej-th layer to the i-th layer is determined. When the direct dependencyflag is one, the processing transitions to step SL05 to execute theprocesses in steps SL05 to SL07. When the direct dependency flag iszero, the processing transitions to step SL0A, skipping the processes insteps SL05 to SL07.

(SL05) The layer identifier nuhLID # j is set in the(NumDirectRefLayers[iNuhLId])-th element of the reference layer ID listRefLayerId[iNuhLId][ ]. That is,RefLayerId[iNuhLId][NumDirectRefLayers[iNuhLId]]=nuhLId # j;

(SL06) The value of the number of direct reference layersNumDirectRefLayers[iNuhLId] is incremented by “1”. That is,NumDirectRefLayers [iNuhLId]++;

(SL07) A value “the number of direct reference layers−1” is set, as thedirect reference layer index (direct reference layer IDX), to the(nuhLId # i)-th element of the direct reference layer IDX listDirectRefLayerIdx[iNuhLid][ ]. That is,DirectRefLayerIdx[iNuhLId][nuhLId # j]=NumDirectRefLayers[iNuhLId]−1;

(SL0A) The end point of the loop pertaining to addition of elements onthe j-th layer to the reference layer ID list pertaining to the i-thlayer and the direct reference layer IDX list.

(SL0B) The end point of the loop pertaining to derivation of thereference layer ID list on the i-th layer, and the direct referencelayer IDX list.

Through use of the reference layer ID list and direct reference layerIDX list, which have been described above, the sequential number of theelement (direct reference layer IDX) indicated by the layer ID of thek-th layer of the direct reference layer set among all the layers, andconversely, the sequential number of the element indicated by the directreference layer IDX in the direct reference layer set can be grasped.The deriving procedures are not limited to the aforementioned steps, andmay be changed in an implementable range.

The inter-layer dependence information contains the syntax“direct_dependency_len_minusN” (layer dependency type bit length)(SYNVPS0C in FIG. 13) that indicates the after-mentioned layerdependency type (direct_dependency_type[i][j]) bit length M. Here, N isa value defined by the total number of kinds of layer dependency types,and is an integer of at least two. The maximum amount of the bit lengthM is, for example, 32. Provided that N=2, the range ofdirect_dependency_type[i][j] is from 0 to (2{circumflex over ( )}32−2).More generally, according to representation with N defined by the bitlength M and the total number of layer dependency types, the range ofdirect_dependency_type[i][j] is from 0 to (2{circumflex over ( )}M−N).

The inter-layer dependence information contains the syntax“direct_dependency_type[i][j]” (SYNVPS0D in FIG. 13) that indicates thelayer dependency type indicating the reference relationship between thei-th layer and the j-th layer. More specifically, when the directdependency flag direct_dependency_flag[i][j] is one, each bit value ofthe layer dependency type(DirectDepType[i][j]=direct_dependency_type[i][j]+1) indicates thepresence/absence flag of the kind of layer dependency type on the j-thlayer, which is the reference layer to the i-th layer. For example, thepresence/absence flags of the kinds of the layer dependency typesinclude the presence/absence flag of the inter-layer image prediction(SamplePredEnabledFlag; inter-layer image prediction presence/absenceflag), the presence/absence flag of the inter-layer motion prediction(MotionPredEnabledFlag; inter-layer motion prediction presence/absenceflag), and the presence/absence flag of the non-VCL dependency(NonVCLDepEnabledFlag; non-VCL dependency presence/absence flag). Thepresence/absence flag of the non-VCL dependency indicates presence orabsence of the inter-layer dependency relationship pertaining to theheader information (parameter sets of SPS, PPS, etc.) included in thenon-VCL NAL unit. For example, the presence or absence of the sharing ofthe parameter set between layers (shared parameter set), which will bedescribed later, and the presence or absence of a part of the syntaxprediction in the parameter set between layers (e.g., scaling listinformation (quantization matrix) etc.) (also called syntax predictionbetween parameter sets, or prediction between parameter sets) areincluded. The value coded according to the syntax“direct_dependency_type[i][j]” is the value of layer dependency type−1,i.e., the value of “DirectDepType[i][j]−1” in the example of FIG. 14.

Here, FIG. 14(a) shows an example of the correspondence between thevalue of the layer dependency type according to this embodiment(DirectDepType[i][j]=direct_dependency_type[i][j]+1), and the kinds oflayer dependency types. As shown in FIG. 14(a), the value of the leastsignificant bit (0 bit) of the layer dependency type indicates thepresence or absence of the inter-layer image prediction. The value ofthe first bit next to the least significant bit indicates the presenceor absence of the inter-layer motion prediction. The value of (N−1)-thbit subsequent to the least significant bit indicates the presence orabsence of the non-VCL dependency. Each of bits between the N-th bitfrom the least significant bit and the most significant bit ((M−1)-thbit) are bits for extension of the dependency type.

The presence/absence flag of each layer dependency type on the referencelayer j to the target layer i (layer identifier iNuhLId=numLayerId1) isderived by the following expression.

SamplePredEnabledFlag[iNuhLId][j]=((direct_dependency_type[i][j]+1)& 1);

MotionPredEnabledFlag[iNuhLId][j]=((direct_dependency_type[i][j]+1)&2)>>1;

NonVCLDepEnabledFlag[iNuhLid][j]=((direct_dependency_type[i][j]+1)&(1<<(N−1)))>>(N−1);

Alternatively, through use of the variable DirectDepType[i][j] insteadof (direct_dependency_type[i][j]+1), representation according to thefollowing expression can be made.SamplePredEnabledFlag[iNuhLId][j]=((DirectDepType[i][j]) & 1);

MotionPredEnabledFlag[iNuhLId][j]=((DirectDepType[i][j])& 2)>>1;

NonVCLDepEnabledFlag[iNuhLid][j]=((DirectDepType[i][j])&(1<<(N−1)))>>(N−1);

In the example of FIG. 14(a), the (N−1)-th bit is configured to be thenon-VCL dependency type (non-VCL dependency presence/absence flag).However, the configuration is not limited thereto. For example, it maybe configured so that N=3, and the second bit from the least significantbit can be configured to be a bit representing the presence or absenceof the non-VCL dependency type. The position of bit representing thepresence/absence flag for each dependency type may be changed within animplementable range. Each presence/absence flag may be derived accordingto the aforementioned derivation of the reference layer ID list and thedirect reference layer IDX list, which is adopted as step SL08 andexecuted. The deriving procedures are not limited to the aforementionedsteps, and may be changed in an implementable range.

(Derivation of Indirect Dependency Flag and Dependency Flag)

Here, the indirect dependency flag (IndirectDependencyFlag[i][j]) thatrepresents the dependency relationship on whether the i-th layerindirectly depends on the j-th layer or not (whether the j-th layer isthe indirect reference layer of the i-th layer or not) can be derivedfrom the after-mentioned pseudocode with reference to the directdependency flag (direct_dependency_flag[i][j]). Likewise, the dependencyflag (DependencyFlag[i][j]) that represents the dependency relationshipon whether the i-th layer directly depends on the j-th layer (when thedirect dependency flag is one, the j-th layer is also called the directreference layer of the i-th layer) or whether the layer indirectlydepends thereon or not (when the indirect dependency flag is one, thej-th layer is also called the indirect reference layer of the i-thlayer) can be derived from the after-mentioned pseudocode with referenceto the direct dependency flag (direct_dependency_flag[i][j]) and theindirect dependency flag (IndirectDepdendencyFlag[i][j]). Here,referring to FIG. 31, the indirect reference layer is described. In FIG.31, the number of layers is N+1. The j-th layer (L # j in FIG. 31;called layer j) is lower layer than the i-th layer (L # i in FIG. 31;called layer i) (j<i). It is also assumed that there is a layer k thatis higher than the layer j and lower than the layer i (L # k in FIG. 31)(j<k<i). In FIG. 31, the layer k is directly depends on the layer j (asolid-line arrow in FIG. 31; the layer j is the direct reference layerof the layer k direct_dependency_flag[k][j]==1), and the layer idirectly depends on the layer k (the layer k is the direct referencelayer of the layer j direct_dependency_flag[i][k]==1). Here, the layer iindirectly depends on the layer j through the layer k (a broken arrow inFIG. 31). Consequently, the layer j is called the indirect referencelayer of the layer i. In other words, in the case where the layer idepends on the layer j through one or more layers k (i<k<j), the layer jis the indirect reference layer of the layer i.

The indirect dependency flag IndirectDependencyFLag[i][j] representswhether the i-th layer indirectly depends on the j-th layer or not. Inthe case of indirect dependence, the flag has a value of one. In thecase without indirect dependence, the flag has a value of zero. Here,the case where the i-th layer indirectly depends on the j-th layer andthe i-th layer is regarded as the target layer to be subjected to thedecoding process means that there is a possibility that the parameterset pertaining to the j-th layer, the decoded picture, and the relateddecoded syntax are indirectly referred to by the target layer. On thecontrary, the case where the i-th layer does not indirectly depend onthe j-th layer and the i-th layer is regarded as the target layer to besubjected to the decoding process means that there is a possibility thatthe parameter set pertaining to the j-th layer, the decoded picture, andthe related decoded syntax are not indirectly referred to. In otherwords, when the indirect dependency flag of the i-th layer to the j-thlayer is one, the j-th layer can be an indirect reference layer of thei-th layer. A set of layers that can be an indirect reference layer to acertain layer, that is, the set of layers with the correspondingindirect dependency flag having a value of one, is called an indirectdependency layer set. The layer with i=0, i.e., the 0-th layer (basiclayer) has no indirect dependency relationship with the j-th layer(extended layer). Consequently, the indirect dependency flag“IndirectDepedencyFlag[i][j]” has a value of zero. Derivation of theindirect dependency flag of the j-th layer (extended layer) to the 0-thlayer (basic layer) can be omitted.

The dependency flag DependencyFLag[i][j] represents whether the i-thlayer depends on the j-th layer or not. In the case of dependence, theflag has a value of one. In the case without dependence, the flag has avalue of zero. Reference and dependency pertaining to the dependencyflag DependencyFLag[i][j] include both of direct and indirect manners(direct reference, indirect reference, direct dependency, and indirectdependency), if not otherwise specified. Here, the case where the i-thlayer depends on the j-th layer and the i-th layer is regarded as thetarget layer to be subjected to the decoding process means that there isa possibility that the parameter set pertaining to the j-th layer, thedecoded picture, and the related decoded syntax are referred to by thetarget layer. On the contrary, the case where the i-th layer does notdepend on the j-th layer and the i-th layer is regarded as the targetlayer to be subjected to the decoding process means that the parameterset pertaining to the j-th layer, the decoded picture, and the relateddecoded syntax are not referred to. In other words, when the dependencyflag of the i-th layer to the j-th layer is one, the j-th layer can be adirect reference layer or an indirect reference layer of the i-th layer.A set of layers that can be a direct reference layer or an indirectreference layer to a certain layer, that is, the set of layers with thecorresponding dependency flag having a value of one, is called adependency layer set. The layer with i=0, i.e., the 0-th layer (basiclayer) has no dependency relationship with the j-th layer (extendedlayer). Consequently, the dependency flag “DependencyFLag[i][j]” has avalue of zero. Derivation of the dependency flag of the j-th layer(extended layer) to the 0-th layer (basic layer) can be omitted.

(Pseudocode)

   for(i=0; i< vps_max_layers_minus1+1; i++){  for(j=0; j<i; j++){ IndirectDependencyFlag[i][j] = 0;  DependencyFlag[i][j] = 0; for(k=j+1; k<i; k++){  if(  direct_dependency_flag[k][j]  &&direct_dependency_flag[i][k]{ && !direct_dependency_flag[i][j]) IndirectDependencyFlag[i][j] = 1;  }  } DependencyFlag[i][j]  =  (direct dependency_flag[i][j]|  IndirectDependencyFlag[i][j]);  } // end of loop on for(j=0; j<i;i++)  } // end of loop on for(i=0; i< vps_max_layers_minus1+1; i++)

The pseudocode can be represented in steps as follows.

(SN01) The start point of a loop pertaining to derivation of theindirect dependency flag and the dependency flag that pertain to thei-th layer. The variable i is initialized to zero before the loopstarts. The process in the loop is executed when the variable i is lessthan the number of layers “vps_max_layers_minus1+1”. The variable i isincremented by “1” every time the process in the loop is executed onetime.

(SN02) The start point of a loop pertaining to derivation of theindirect dependency flag and the dependency flag that pertain to thei-th layer and the j-th layer. Before the loop is started, the variablej is initialized to zero. The process in the loop is executed when thevariable j (j-th layer) is less than that of the i-th layer (j<i). Thevariable j is incremented by “1” every time the process in the loop isexecuted one time.

(SN03) The value of the j-th element of the indirect dependency flagIndirectDependencyFLag[i][ ] is set to zero. The value of the j-thelement of the dependency flag DependencyFLag[i][ ] is set to zero. Thatis, IndirectDependencyFlag[i][j]=0, and DependencyFlag[i][j]=0.

(SN04) The start point of a loop for searching for whether the j-thlayer is the indirect reference layer of the i-th layer or not. Beforethe loop is started, the variable k is initialized to “j+1”. The processin the loop is executed when the variable k has a value less than thevariable i. The variable k is incremented by “1” every time the processin the loop is executed one time.

(SN05) To determine whether the j-th layer is the i-th indirectreference layer, the following conditions (1) to (3) are determined.

(1) It is determined whether the j-th layer is the k-th direct referencelayer. More specifically, when the direct dependency flag(direct_dependency_flag[k][j]) of the j-th layer to the k-th layer isone, it is determined to be true (direct reference layer), and when thedirect dependency flag is zero (not direct reference layer), it isdetermined to be false.

(2) It is determined whether the k-th layer is the i-th direct referencelayer. More specifically, when the direct dependency flag(direct_dependency_flag[i][k]) of the k-th layer to the i-th layer isone, it is determined to be true (direct reference layer), and when thedirect dependency flag is zero (not direct reference layer), it isdetermined to be false.

(3) It is determined whether the j-th layer is not the i-th directreference layer. More specifically, when the direct dependency flag(direct_dependency_flag[i][j]) of the j-th layer to the i-th layer iszero (not direct reference layer), it is determined to be true, and whenthe direct dependency flag is one (direct reference layer), it isdetermined to be false.

In the case where all of the conditions (1) to (3) are true (i.e., thedirect dependency flag of the j-th layer to the k-th layerdirect_dependency_flag[k][j] is one, and the direct dependency flag ofthe k-th layer to the i-th layer direct_dependency_flag[i][k] is one,and the direct dependency flag of the j-th layer to the i-th layerdirect_dependency_flag[i][j] is zero), the processing transitions tostep SN06. In the other cases (at least any one of the conditions (1) to(3) is false, i.e., the direct dependency flag of the j-th layer to thek-th layer direct_dependency_flag[k][j] is zero, or the directdependency flag of the k-th layer to the i-th layerdirect_dependency_flag[i][k] is zero, or the direct dependency flag ofthe j-th layer to the i-th layer direct_dependency_flag[i][j] is one),the processing transitions to step SN07, skipping the process in stepSN06.

(SN06) In the case where all the conditions (1) to (3) are true, thej-th layer to the i-th layer is determined as the indirect referencelayer, and the value of the j-th element of the indirect dependency flagIndirectDependencyFlag[i][ ] is set to one. That is,IndirectDependencyFlag[i][j]=1.

(SN07) The end point of the loop for searching for whether the j-thlayer is the indirect reference layer of the i-th layer or not.

(SN08) On the basis of the direct dependency flag(direct_dependency_flag[i][j]), and the indirect dependency flag(IndirectDependencyFlag[i][j]), the value of the dependency flag(DependencyFlag[i][j]) is set. More specifically, the value of thelogical sum of the value of the direct dependency flag(direct_dependency_flag[i][j]) and the value of the indirect dependencyflag (direct_dependency_flag[i][j]) is adopted as the value of thedependency flag (DependencyFlag[i][j]). That is, derivation is madeaccording to the following expression. In the case where the value ofthe direct dependency flag is one or the value of the indirectdependency flag is one, the value of the dependency flag is one. In theother cases (the value of the direct dependency flag is zero and thevalue of the indirect dependency flag is zero), the value of thedependency flag is zero. The following deriving expression is only oneexample, and can be changed in the range where the value set by thedependency flag is the same.

DependencyFlag[i][j]=(direct_dependency_flag[i][j]|IndirectDependencyFlag[i][j]);

(SN0A) The end point of the loop pertaining to derivation of theindirect dependency flag and the dependency flag that pertain to thei-th layer and the j-th layer.

(SN0B) The end point of the loop pertaining to derivation of theindirect dependency flag and the dependency flag that pertain to thei-th layer.

As described above, the indirect dependency flag(IndirectDependencyFlag[i][j]) representing the dependency relationshipwhere the i-th layer indirectly depends on the j-th layer is thusderived, which allows grasping whether the j-th layer is the indirectreference layer of the i-th layer.

The dependency flag (DependencyFlag[i][j]) representing the dependencyrelationship in the case where the i-th layer depends on the j-th layer(the direct dependency flag is one or the indirect dependency flag isone) is derived, which allows grasping whether the j-th layer is thedirect reference layer or the indirect reference layer of the i-thlayer. The deriving procedures are not limited to the aforementionedsteps, and may be changed in an implementable range. For example, theindirect dependency flag, and the dependency flag may be derived by thefollowing pseudocode.

(Pseudocode)

   // derive indirect reference layers of layer i  for(i=2; i<vps_max_layers_minus1+1; i++){  for(k=1; k<i; k++){  for(j=0; j<k; j++){ if( (direct_dependency_flag[k][j]||IndirectDependencyFlag[k][j]) direct_dependency_flag[i][k] && !direct_dependency_flag[i][j]){ IndirectDependencyFlag[i][j] = 1;  }  } // end of loop on for(j=0; j<k;j++)  } // end of loop on for(k=1; k<i; k++)  } // end of loop onfor(i=2; i< vps_max_layers_minus1+1; i++)  // derive dependence layers(direct or indirect reference layers)  of layer i  for(i=0; i<vps_max_layers_minus1+1; i++){  for(j=0; j<i; j++){ DependencyFlag[i][j]  =  (direct_dependency_flag[i][j]|  IndirectDependencyFlag[i][j]);  } // end of loop on for(j=0; j<i;i++)  } // end of loop on for(i=0; i< vps_max_layers_minus1+1; i++)

The pseudocode can be represented in steps as follows. It is assumedthat before the start of step SO01, the values of all of the elements ofthe indirect dependency flag IndirectDependencyFlag[ ][ ] and thedependency flag DependencyFlag[ ][ ] have been already initialized tozero.

(SO01) The start point of a loop pertaining to derivation of theindirect dependency flag pertaining to the i-th layer (layer i). Beforethe loop is started, the variable i is initialized to two. The processin the loop is executed when the variable i is less than the number oflayers “vps_max_layers_minus1+1”. The variable i is incremented by “1”every time the process in the loop is executed one time. The variable iis thus started at two because the indirect reference layer occurs onthe third or higher layer.

(SO02) The start point of a loop pertaining to k-th layer (layer k)(j<k<i), which is lower than the i-th layer (layer i) and higher thanthe j-th layer (layer j). Before the loop is started, the variable i isinitialized to one. The process in the loop is executed when thevariable k (layer k) is less than that of the layer i (k<i). Thevariable k is incremented by “1” every time the process in the loop isexecuted one time. The variable k is thus started at one because theindirect reference layer occurs on the third or higher layer.

(SO03) The start point of a loop that searches for whether the layer jis the indirect reference layer of the layer i or not. Before the loopis started, the variable j is initialized to zero. The process in theloop is executed when the variable j (layer j) is less than that of thelayer k (j<k). The variable j is incremented by “1” every time theprocess in the loop is executed one time.

(SO04) To determine whether the layer j is the indirect reference layerof the layer i, the following conditions (1) to (3) are determined.

(1) It is determined whether the layer j is the direct reference layeror the indirect reference layer of the layer k. More specifically, whenthe direct dependency flag (direct_dependency_flag[k][j]) of the layer jto the layer k is one or the indirect dependency flag(IndirectDependencyFlag[k][j]) of the layer j to the layer k is one, itis determined to be true (the direct reference layer or the indirectreference layer). When the direct dependency flag is zero (not directreference layer) and the indirect dependency flag is zero (not indirectreference layer), it is determined to be false.

(2) It is determined whether the layer k is the direct reference layerof the layer i. More specifically, when the direct dependency flag(direct_dependency_flag[i][k]) of the layer k to the layer i is one, itis determined to be true (direct reference layer), and when the directreference layer is zero (not direct reference layer), it is determinedto be false.

(3) It is determined whether the layer j is not the direct referencelayer of the layer i. More specifically, when the direct dependency flag(direct_dependency_flag[i][j]) of the layer j to the layer i is zero(not direct reference layer), it is determined to be true, and when thedirect reference layer is one (direct reference layer), it is determinedto be false.

In the case where all of the conditions (1) to (3) are true (i.e., (thedirect dependency flag of the layer j to the layer k is one or theindirect dependency flag thereof is one), and the direct dependency flagof the layer j to the layer i direct_dependency_flag[i][k] is one, andthe direct dependency flag of the layer j to the layer idirect_dependency_flag[i][j] is zero), the processing transitions tostep SO05. In the other cases (the cases where at least any one of theconditions (1) to (3) is false, i.e., the direct dependency flag of thelayer j to the layer k is zero, and the indirect dependency flag thereofis zero), or the direct dependency flag of the layer to the layer idirect_dependency_flag[i][k] is zero, or the direct dependency flag ofthe layer to the layer i direct_dependency_flag[i][j] is one), theprocessing transitions to step SO06, skipping the process in step SO05.

(SO05) In the case where all the conditions (1) to (3) are true, thelayer j is determined as the indirect reference layer of the layer i,and the value of the j-th element of the indirect dependency flagIndirectDependencyFlag[i][ ] is set to one. That is,IndirectDependencyFlag[i][j]=1.

(SO06) The end point of the loop that searches for whether the layer jis the indirect reference layer of the layer i or not.

(SO07) The end point of the loop pertaining to the layer k (j<k<i) thatis lower than the layer i and higher than the layer j.

(SO08) The end point of the loop pertaining to derivation of theindirect dependency flag pertaining to the layer i.

(S00A) The start point of a loop pertaining to derivation of thedependency flag pertaining to the layer i.

Before the loop is started, the variable i is initialized to zero. Theprocess in the loop is executed when the variable i is less than thenumber of layers “vps_max_layers_minus1+1”. The variable i isincremented by “1” every time the process in the loop is executed onetime.

(S00B) The start point of a loop that searches for whether the layer jis the dependence layer of the layer i (the direct reference layer orthe indirect reference layer). Before the loop is started, the variablej is initialized to zero. The process in the loop is executed when thevariable j is less than the variable i (j<i). The variable j isincremented by “1” every time the process in the loop is executed onetime.

(S00C) On the basis of the direct dependency flag(direct_dependency_flag[i][j]) and the indirect dependency flag(IndirectDependencyFlag[i][j]), the value of the dependency flag(DependencyFlag[i][j]) is set. More specifically, the value of thelogical sum of the value of the direct dependency flag(direct_dependency_flag[i][j]) and the value of the indirect dependencyflag (direct_dependency_flag[i][j]) is adopted as the value of thedependency flag (DependencyFlag[i][j]). That is, derivation is madeaccording to the following expression. In the case where the value ofthe direct dependency flag is one or the value of the indirectdependency flag is one, the value of the dependency flag is one. In theother case (the value of the direct dependency flag is zero and thevalue of the indirect dependency flag is zero), the value of thedependency flag is zero. The following deriving expression is only oneexample, and can be changed in the range where the value set in thedependency flag is the same.

DependencyFlag[i][j]=(direct_dependency_flag[i][j]|IndirectDependencyFlag[i][j]);

(S00D) The end point of the loop that searches for whether the layer jis the dependence layer of the layer i (the direct reference layer orthe indirect reference layer).

(S00E) The end point of the loop pertaining to derivation of thedependency flag pertaining to the layer i.

As described above, the indirect dependency flag(IndirectDependencyFlag[i][j]) representing the dependency relationshipwhere the layer i indirectly depends on the layer j is thus derived,which allows grasping whether the layer j is the indirect referencelayer of the layer i. The dependency flag (DependencyFlag[i][j])representing the dependency relationship in the case where the layer idepends on the layer j (the direct dependency flag is one or theindirect dependency flag is one) is thus derived, which allows graspingwhether the layer j is the dependence layer (the direct reference layeror the indirect reference layer) of the layer i. The deriving proceduresare not limited to the aforementioned steps, and may be changed in animplementable range.

In the above example, the dependency flag DependencyFlag[i][j]indicating whether the j-th layer with respect to the i-th layer is adirect reference layer or an indirect reference layer is derived forindices i and j on all the layers. Alternatively, the layer identifierof the i-th layer nuhLId # i and the layer identifier of the j-th layernuhLId # j may be used to derive the dependency flag between the layeridentifiers (inter-layer-identifiers dependency flag LIdDipendencyFlag[][ ]. In this case, it is assumed that in the aforementioned step SN08or SO0C, the first element of the inter-layer-identifiers dependencyflag (LIdDependencyFlag[ ][ ]) is the layer identifier of the i-th layernuhLId # i and the second element is the layer identifier of the j-thlayer nuhLId # j, the value of the inter-layer-identifiers dependencyflag (LIdDependencyFlag[nuhLId # i][nuhLId # j]) is derived. That is, asdescribed below, in the case where the value of the direct dependencyflag is one or the value of the indirect dependency flag is one, thevalue of the inter-layer-identifiers dependency flag is one. In theother case (the value of the direct dependency flag is zero and thevalue of the indirect dependency flag is zero), the value of theinter-layer-identifiers dependency flag is zero.

LIdDependencyFlag[nuhLId # i][nuhLId #j]=(direct_dependency_flag[i][j]|IndirectDependencyFlag[i][j]);

As described above, the inter-layer-identifiers dependency flag(Lid0DependencyFlag[nuhLId # i][nuhLId # j]) representing that the i-thlayer having the layer identifier nuhLId # i is directly or indirectlydepends on the j-th layer having the layer identifier nuhLId # j or notis thus derived, which allows grasping whether the j-th layer having thelayer identifier nuhLId # j is a direct reference layer or an indirectreference layer of the i-th layer having the layer identifier nuhLId #i. The aforementioned procedures are not limited thereto, and may bechanged in an implementable range.

(Sequence Parameter Set SPS)

The sequence parameter set SPS defines a set of coding parametersreferred to by the image decoding apparatus 1 for decoding the targetsequence.

The active VPS identifier is an identifier for designation of the activeVPS to which the target SPS refers, and is included in SPS as the syntax“sps_video_parameter_set_id” (SYNSPS01 in FIG. 15). The parameter setdecoding unit 12 may decode the active VPS identifier included in thesequence parameter set SPS, which is the decoding target, read, from theparameter set management unit 13, the active VPS coding parameterdesignated by the active VPS identifier, and refer to the active VPScoding parameter when decoding each syntax of the decoding target SPSthereafter. In the case where each syntax of the decoding target SPSdoes not depend on the active VPS coding parameter, the VPS activatingprocess at the time of decoding the active VPS identifier of thedecoding target SPS is not required.

The SPS identifier is an identifier for identifying each SPS, andincluded in the SPS as the syntax “sps_seq_parameter_set_id” (SYNSPS02in FIG. 15). The SPS identified by the active SPS identifier(pps_seq_parameter_set_id) included in the after-mentioned PPS isreferred to during a process of decoding the coded data on the targetlayer in the target layer set.

(Picture Information)

The SPS includes, as picture information, information that defines thesize of a decoded picture on the target layer. For example, the pictureinformation includes information on the width and height of the decodedpicture on the target layer. The picture information decoded from SPScontains the width of the decoded picture (pic_width_in_luma_samples)and the height of the decoded picture (pic_height_in_luma_samples) (notshown in FIG. 15). The value of the syntax “pic_width_in_luma_samples”corresponds to the width of the decoded picture in units of luma pixels.The value of the syntax “pic_height_in_luma_samples” corresponds to theheight of the decoded picture in units of luma pixels.

A syntax group indicated by SYNSPS04 in FIG. 15 is information (scalinglist information) pertaining to the scaling list (quantization matrix)used through the entire target sequence. In the scaling listinformation, “sps_infer_scaling_list_flag” (SPS scaling list estimationflag) is a flag that indicates whether or not the information pertainingto the target SPS scaling list is estimated from the active SPS scalinglist information on the reference layer indicated by“sps_scaling_list_ref_layer_id”. When the SPS scaling list estimationflag is one, the scaling list information on the SPS is estimated(copied) from the scaling list information on the active SPS on thereference layer identified by “sps_scaling_list_ref_layer_id”. When theSPS scaling list estimation flag is zero, the scaling list informationis notified based on “sps_scaling_list_data_present_flag” through SPS.

An SPS extension data presence/absence flag “sps_extension_flag”(SYNSPS05 in FIG. 15) is a flag representing whether the SPS furtherincludes the SPS extension data sps_extension( ) or not (SYNSPS06 inFIG. 15).

The SPS extension data (sps_extension( )) contains, for example,inter-layer position correspondence information (SYNSPS0A in FIG. 16)and the like.

(Picture Parameter Set PPS)

The picture parameter set PPS defines a set of coding parametersreferred to by the image decoding apparatus 1 for decoding each picturein the target sequence.

The PPS identifier is an identifier for identifying each PPS, andincluded in the PPS as the syntax “sps_seq_parameter_set_id” (SYNSPS02in FIG. 15). The PPS identified by the active PPS identifier(slice_pic_parameter_set_id) included in the after-mentioned sliceheader is referred to during a process of decoding the coded data on thetarget layer in the target layer set.

The active SPS identifier is an identifier for designation of the activeSPS to which the target PPS refers, and is included in PPS as the syntax“pps_seq_parameter_set_id” (SYNSPS02 in FIG. 17). The parameter setdecoding unit 12 may decode the active SPS identifier included in thepicture parameter set PPS, which is the decoding target, read, from theparameter set management unit 13, the active SPS coding parameterdesignated by the active SPS identifier, invokes the coding parameter ofthe active VPS to which the active SPS refers, and refer to the codingparameters of the active SPS and active VPS when decoding each syntax ofthe decoding target PPS thereafter. In the case where each syntax of thedecoding target PPS does not depend on the coding parameters of theactive SPS and active VPS, SPS and VPS activating processes at the timeof decoding the active SPS identifier of the decoding target PPS are notrequired.

A syntax group indicated by SYNPPS03 in FIG. 17 is information (scalinglist information) pertaining to the scaling list (quantization matrix)used when the picture that refers to the target PPS is decoded. In thescaling list information, “pps_infer_scaling_list_flag” (scaling listestimation flag) is a flag that indicates whether or not the informationpertaining to the target PPS scaling list is estimated from the activePPS scaling list information on the reference layer identified by“pps_scaling_list_ref_layer_id”. When the PPS scaling list estimationflag is one, the scaling list information on the PPS is estimated(copied) from the scaling list information on the active PPS on thereference layer identified by “sps_scaling_list_ref_layer_id”. When thePPS scaling list estimation flag is zero, the scaling list informationis notified based on “sps_scaling_list_data_present_flag” through PPS.

(Picture Decoding Unit 14)

The picture decoding unit 14 generates the decoded picture on the basisof the input VCL NAL unit and the active parameter set, and outputs thedecoded picture.

Referring to FIG. 20, the schematic configuration of the picturedecoding unit 14 is described. FIG. 20 is a functional block diagramshowing the schematic configuration of the picture decoding unit 14.

The picture decoding unit 14 includes a slice header decoding unit 141,and a CTU decoding unit 142. The CTU decoding unit 142 further includesa predictive residue restoring unit 1421, a predictive image generatingunit 1422, and a CTU decoded image generating unit 1423.

(Slice Header Decoding Unit 141)

The slice header decoding unit 141 decodes the slice header on the basisof the input VCL NAL unit and the active parameter set. The decodedslice header is output to the CTU decoding unit 142, together with theinput VCL NAL unit.

(CTU Decoding Unit 142)

In a schematic view, the CTU decoding unit 142 performs decoding toobtain a decoded image in a region corresponding to each of CTUsincluded in the slices constituting the picture, on the basis of theinput slice header, the slice data included in the VCL NAL unit, and theactive parameter set, thus generating the decoded image of the slice.The CTU size for the target layer included in the active parameter set(the syntax to which log 2_min_luma_coding_block_size_minus3, and log2_diff_max_min_luma_coding_block_size in the SYNSPS03 in FIG. 15correspond) is used as the CTU size here. The decoded image of the sliceis output, as a part of the decoded picture, to the slice positionindicated by the input slice header. The decoded image of CTU isgenerated by the predictive residue restoring unit 1421, the predictiveimage generating unit 1422 and the CTU decoded image generating unit1423 in the CTU decoding unit 142.

The predictive residue restoring unit 1421 decodes the predictiveresidue information (TT information) contained in the input slice data,and generates and outputs the predictive residue of the target CTU.

The predictive image generating unit 1422 generates the predictive imageon the basis of the prediction method and prediction parametersindicated by the predictive information (PT information) contained inthe input slice data, and outputs the image. Here, the decoded image ofthe reference picture and the coding parameter therefore are used ifnecessary. For example, in the case of using inter prediction orinter-layer image prediction, the corresponding reference picture isread from the decoded picture management unit 15.

The CTU decoded image generating unit 1423 adds the input predictiveimage to the predictive residue, generates and outputs the decoded imageof the target CTU.

<Decoding Process in Picture Decoding Unit 14>

Hereinafter, referring to FIG. 21, the schematic operation of thepicture decoding on the target layer i in the picture decoding unit 14is described. FIG. 21 is a flowchart showing the decoding process inunits of slices that constitute the picture on the target layer i in thepicture decoding unit 14.

(SD101) The leading slice flag of the decoding target slice(first_slice_segment_in_pic_flag) is decoded. When the leading sliceflag is one, the decoding target slice is the leading slice in thedecoding order in the picture (hereinafter, processing order), and theposition (hereinafter, CTU address) in a raster scanning order in thepicture in the leading CTU in the decoding target slice is set to zero.Furthermore, a counter numCtu of the number of processed CTUs(hereinafter, the number of processed CTUs numCtu) in the picture is setto zero. When the leading slice flag is zero, the leading CTU address ofthe decoding target slice is set on the basis of the slice addressdecoded in the SD106, which will be described later.

(SD102) The active PPS identifier (slice_pic_parameter_set_id) thatdesignates an active PPS referred to during decoding the decoding targetslice is decoded.

(SD104) The active parameter set is fetched from the parameter setmanagement unit 13. That is, a PPS having the PPS identifier(pps_pic_parameter_set_id) identical to the active PPS identifier(slice_pic_parameter_set_id) to which the decoding target slice refersis regarded as the active PPS, and the coding parameter of the activePPS is fetched (read) from the parameter set management unit 13.Furthermore, an SPS having the SPS identifier (sps_seq_parameter_set_id)identical to the active SPS identifier (pps_seq_parameter_set_id) in theactive PPS is regarded as the active SPS, and the coding parameter ofthe active SPS is fetched from the parameter set management unit 13.Moreover, a VPS having the VPS identifier (vps_video_parameter_set_id)identical to the active VPS identifier (sps_video_parameter_set_id) inthe active SPS is regarded as the active VPS, and the coding parameterof the active VPS is fetched from the parameter set management unit 13.

(SD105) It is determined whether the decoding target slice is theleading slice in the processing order in the picture or not on the basisof the leading slice flag. In the case where the leading slice flag iszero (YES in SD105), the processing transitions to step SD106. In theother case (No in SD105), the process in SD106 is skipped. In the casewhere the leading slice flag is one, the slice address of the decodingtarget slice is zero.

(SD106) The slice address (slice_segment_address) of the decoding targetslice is decoded, and the leading CTU address of the decoding targetslice is set. For example, the leading slice CTUaddress=slice_segment_address.

. . . not shown . . . .

(SD10A) The CTU decoding unit 142 generates a CTU decoded image in aregion corresponding to each of CTUs included in the slices constitutingthe picture, on the basis of the input slice header, active parameterset, and each of pieces of CTU information (SYNSD01 in FIG. 18) in theslice data included in the VCL NAL unit. Furthermore, after each of thepieces of CTU information, there is a slice end flag(end_of_slice_segment_flag) indicating whether the CTU is the end of thedecoding target slice or not (SYNSD02 in FIG. 18). After each CTU isdecoded, the value of the number of processed CTUs is incremented by one(numCtu++).

(SD10B) It is determined whether the CTU is the end of the decodingtarget slice or not on the basis of the slice end flag. In the casewhere the slice end flag is one (YES in SD10B), the processingtransitions to step SD10C. In the other case (No in SD10B), theprocessing transitions to SD10A to decode the subsequent CTUinformation.

(SD10C) It is determined whether the number of processed CTUs numCtureaches the total number of CTUs (PicSizeInCtbsY) that constitute thepicture or not. That is, it is determined whether numCtu==PicSizeInCtbsYor not. In the case where numCtu is equal to PicSizeInCtbsY (YES inSD10C), the decoding process in units of slices that constitute thedecoding target picture is finished. In other case(numCtu<PicSizeInCtbsY) (No in SD10C), the processing transitions toSD101 to continue the decoding process in units of slices thatconstitute the decoding target picture.

The operation of the picture decoding unit 14 according to Embodiment 1is thus described above. The steps are not limited to the above steps.Alternatively, the steps may be changed in an implementable range.

(Advantageous Effects of Video Decoding Apparatus 1)

The hierarchical video decoding apparatus 1 (hierarchical image decodingapparatus) according to this embodiment, which has been described above,includes the bit stream extraction unit 17 that performs the bit streamextraction process on the basis of the layer ID list LayerIdListTargeton the layer constituting the target layer set LayerSetTarget suppliedfrom the outside, and of the target highest temporal identifierHighestTidTarget, and removes (discards), from the input hierarchicallycoded data DATA, the target highest temporal identifierHighestTidTarget, and NAL units that is not included in a set (called atarget set TargetSet) defined by the layer ID list LayerIdListTarget ofthe target layer set LayerSetTarget, and thus extracts the target layerset coded data DATA # T (BitstreamToDecode) made up of NAL unitsincluded in the target layer set TargetSet. Furthermore, in the casewhere the layer identifier of the video parameter set is not included inthe target set TargetSet, the bit stream extraction unit 17 ischaracterized by updating (rewriting) the layer identifier of the videoparameter set to the lowest layer identifier in the target setTargetSet. The operation of the bit stream extraction unit 17 assumes“the AU constituting the input hierarchically coded data DATA includesone (at the maximum) VPS having the lowest layer identifier in the AU”.However, the operation is not limited thereto. For example, a VPS havinga layer identifier other than the lowest layer identifier in the AU maybe included in the AU. In this case, the bit stream extraction unit 17may regard the VPS of the layer identifier to be updated as what is alayer identifier that is not included in the target set TargetSet and isthe lowest layer identifier, in step SG104 in FIG. 27. Typically, sincethe VPS having the layer identifier “nuhLayerId=0” is the VPS having thelowest layer identifier, this VPS is regarded as that to be updated, andthe other VPSs that are not included in the target set TargetSet arediscarded.

Consequently, the hierarchical video decoding apparatus 1 according tothis embodiment can prevent the problem in that the VPS is not includedin the layer set on the bit stream after bit stream extraction. That is,it can prevent occurrence of the layer that cannot be decoded on the bitstream that is generated through the bit stream extraction process fromthe bit stream on a certain layer set and only includes the layer set ofa sub-set of the layer set.

As the parameter sets (VPS, SPS and PPS) used to decode the targetlayer, the parameter set used to decode the reference layer is shared(referred to), which can omit the decoding process pertaining to theparameter set on the target layer. That is, the parameter set can bedecoded with a smaller amount of code.

(Variation Example 1 of Bit Stream Extraction Unit 17)

As shown in FIG. 27, according to the bit stream extraction unit 17 ofEmbodiment 1, in the case where the VPS layer identifier is not includedin the target set TargetSet, the VPS layer identifier is updated(rewritten) to the lowest layer identifier in the target set TargetSetto thereby cause the VPS to be necessarily included in the target setTargetSet. However, the configuration is not limited thereto. Forexample, it may be so configured that the bit stream extraction unit 17does not update the layer identifier of the VPS that is not included inthe layer ID list LayerIdListTarget configuring the target setTargetSet, omits discarding of the NAL unit of the VPS, and includes theVPS in the bit stream of the extracted target set TargetSet.Hereinafter, referring to FIG. 28, the operation of the bit streamextraction unit 17′ according to Variation Example 1 is described.Operations common to those of the bit stream extraction unit 17according to Embodiment 1 are assigned the same symbols, and are notdescribed.

(SG102 a) It is determined whether the NAL unit type (nal_unit_type) ofthe target NAL unit is a parameter set or not on the basis of the“nal_unit_type” and “Name of nal_unit_type” shown in FIG. 7. Here, tosatisfy the conformance condition CY1, which will be described below, itis determined whether the NAL unit type is “VPS_NUT” (nal_unit_type==32)or not. That is, when the NAL unit type is the video parameter set (YESin SG102 a), the processing transitions to step SG107, omitting theoperations pertaining to the VPS in steps SG105 a and SG106 a. In theother case (No in SG102 a), the processing transitions to step SG105 a.

(SG105 a) It is determined whether or not the layer identifier and thetemporal identifier of the target NAL unit are included in the targetset TargetSet, on the basis of the layer ID list LayerIdListTarget ofthe layer constituting the target layer set LayerSetTarget, and of thetarget highest temporal identifier. The detailed operations are the sameas those in step SG105 in FIG. 27. Consequently, the description thereofis omitted.

(SG106 a) The target NAL unit is discarded. That is, as the target NALunit has not been included in the target set TargetSet, the bit streamextraction unit 17′ removes the target NAL unit from the inputhierarchically coded data DATA.

The operation of the bit stream extraction unit 17′ according toVariation Example 1 is thus described above. The steps are not limitedto the above steps. Alternatively, the steps may be changed in animplementable range.

The bit stream extraction unit 17′ according to Variation Example 1,which has been described above, is characterized by performing the bitstream extraction process on the basis of the layer ID listLayerIdListTarget on the layer constituting the target layer setLayerSetTarget supplied from the outside, and of the target highesttemporal identifier HighestTidTarget, and removing (discarding), fromthe input hierarchically coded data DATA, the target highest temporalidentifier HighestTidTarget, and NAL units that is not included in atarget set TargetSet defined by the layer ID list LayerIdListTarget ofthe target layer set LayerSetTarget, except the NAL units whose NAL unittypes are VPS, and thus extracting and outputting the target layer setcoded data DATA # T (BitstreamToDecode) made up of NAL units included inthe target set TargetSet. In other words, in the case where the layeridentifier of the video parameter set is not included in the target setTargetSet, the bit stream extraction unit 17′ does not discard the NALunit of the video parameter set, and includes the VPS into the bitstream of the target set TargetSet.

The operation of the bit stream extraction unit 17′ assumes “the AUconstituting the input hierarchically coded data DATA includes one (atthe maximum) VPS having the lowest layer identifier in the AU”. However,the operation is not limited thereto. For example, a VPS having a layeridentifier other than the lowest layer identifier in the AU may beincluded in the AU. In this case, the bit stream extraction unit 17′ mayadd a condition, to step SG102 a, “whether the VPS layer identifier is alayer identifier that is not included in the target set TargetSet and isthe lowest layer identifier”. Typically, since the VPS having the layeridentifier “nuhLayerId=0” is the VPS having the lowest layer identifier,this VPS is regarded as the VPS to be included in the target setTargetSet, and the other VPSs that are not included in the target setTargetSet are discarded.

Consequently, the bit stream extraction unit 17′ according to VariationExample 1 described above can prevent the problem in that the VPS is notincluded in the layer set on the bit stream after bit stream extraction.That is, it can prevent occurrence of the layer that cannot be decodedon the bit stream that is generated through the bit stream extractionprocess from the bit stream on a certain layer set and only includes thelayer set of a sub-set of the layer set.

(Constraint on VPS According to Variation Example 1 of Bit StreamExtraction Unit 17)

To perform bit stream extraction described with reference to the bitstream extraction unit 17′ according to Variation Example 1, the bitstream is required to satisfy the following condition CY1 as a bitstream conformance.

CY1: “the target set TargetSet (layer set) includes the VPS having thelayer identifier equal to the lowest layer identifier among those of allthe layers”;

In other words, the bit stream constraint CY1 is “the VPS included inthe access unit belongs to the same layer as the VCL having the lowestlayer identifier among all the layers (including layers that are notincluded in the access unit).

“The VPS included in the access unit belongs to the same layer as thatof the VCL having the lowest layer identifier among all the layers(including layers that are not included in the access unit) means” inthe case where the layer in the layer set B that is a subset of thelayer set A refers to the VPS on the layer ‘included in the layer set Abut is not included in the layer set B’ in the layer set A, in the layerset B extracted through bit stream extraction, the VPS having the samecoding parameter as the aforementioned VPS is included in the layer setB”. The VPS having the same coding parameter as the aforementioned VPSindicates that the VPS identifier and the syntax in another VPS are thesame as those of the aforementioned VPS except the layer identifier andthe temporal identifier. Consequently, provision of the bit streamconstraint can solve the problem in that the VPS is not included in thelayer set on the bit stream after bit stream extraction. That is, it canprevent occurrence of the layer that cannot be decoded on the bit streamthat is generated through the bit stream extraction process from the bitstream on a certain layer set and only includes the layer set of asub-set of the layer set.

In the aforementioned conformance condition CY1, “the target setTargetSet (layer set) includes the VPS having the same layer identifieras the lowest layer identifier among those of all the layers”. Theconfiguration is not limited thereto. For example, the lowest layeridentifier may be adopted as the layer identifiernuhLayerId=0(nuh_layer_id=0). That is, in consideration of the bitstream conformance, the bit stream is required to satisfy at least thefollowing condition CY2.

CY2: “the target set TargetSet (layer set) includes the VPS having thesame layer identifier as the layer identifier nuh_layer_id=0”

The conformance condition CY2 also exerts advantageous effects analogousto those of the conformance condition CY1. Furthermore, in the case ofthe constraints of the conventional arts (Non-Patent Literatures 2 and3) (the VPS layer identifier is zero), when a certain layer in TargetSetduring bit stream extraction refers to the VPS with nuh_layer_id=0, theVPS with nuh_layer_id=0 that is not included in TargetSet is notdiscarded. Consequently, the layer in TargetSet can be prevented frombeing undecodable.

(Variation Example 1a of Bit Stream Extraction Unit 17)

Furthermore, in the case of the constraints (the layer identifiers ofVPS/SPS/PPS are zero) of the conventional art (Non-Patent Literature 4),in addition to CY2, at least CY3 and CY4 are required to be satisfied asbit stream conformances.

CY3: “the target set TargetSet (layer set) includes the SPS having thesame layer identifier as the layer identifier nuh_layer_id=0”

CY4: “The target set TargetSet (layer set) includes the PPS having thelayer identifier equal to the layer identifier nuh_layer_id=0”.

In the case of applying the bit stream constraints CY2 to CY4, it may beso configured that the operations of the bit stream extraction unit 17′according to Variation Example 1 (step G102 in FIG. 28) are changed tothe following process (SG102 a′).

(SG102 a′) It is determined whether the NAL unit type (nal_unit_type) ofthe target NAL unit is a parameter set or not on the basis of the“nal_unit_type” and “Name of nal_unit_type” shown in FIG. 7. Here, tosatisfy the conformance conditions CY2 to CY4, it is determined whetherthe NAL unit type is “VPS_NUT” (nal_unit_type==32) or “SPS NUT”(nal_unit_type==33) or “PPS NUT” (nal_unit_type==34). That is, in thecase where the NAL unit type is any parameter set of the VPS, SPS andPPS (YES in SG102 a′), the processing transitions to step SG107. In theother case (No in SG102 a), the processing transitions to step SG105 a.

The operation of the bit stream extraction is thus described above. Thesteps are not limited to the above steps. Alternatively, the steps maybe changed in an implementable range. The bit stream extraction unit 17′having been subjected to the aforementioned change is called a bitstream extraction unit 17′a according to Variation Example 1 a.

Furthermore, according to the bit stream extraction unit 17′a, in thecase of the constraints of the conventional arts (Non-Patent Literature4) (the VPS/SPS/PPS layer identifiers are zero), when a certain layer inTargetSet during bit stream extraction refers to the VPS/SPS/PPS withnuh_layer_id=0, the VPS/SPS/PPS with nuh_layer_id=0 that is not includedin TargetSet is not discarded. Consequently, the layer in TargetSet canbe prevented from being undecodable.

(Variation Example 2 of Bit Stream Extraction Unit 17)

As shown in FIG. 27, according to the bit stream extraction unit 17 ofEmbodiment 1, in the case where the VPS layer identifier is not includedin the target set TargetSet, the VPS layer identifier is updated(rewritten) to the lowest layer identifier in the target set TargetSetto thereby cause the VPS to be necessarily included in the target setTargetSet. However, the configuration is not limited thereto. Forexample, it may be so configured that the bit stream extraction unit 17omits discarding of NAL units of the VCL and non-VCL pertaining to thereference layer (the direct reference layer and the indirect referencelayer) on which each layer in the target set TargetSet depends and whichis not included in the layer ID list LayerIdListTarget constituting thetarget set TargetSet, and the extracted target set TargetSet includesthe VCL and the non-VCL. Hereinafter, referring to FIG. 29, theoperation of the bit stream extraction unit 17″ according to VariationExample 2 is described. Operations common to those of the bit streamextraction unit 17 according to Embodiment 1 are assigned the samesymbols, and are not described. The bit stream extraction unit 17″ hasthe same function as VPS decoder means in the parameter set decodingunit 12 in order to derive the dependence layer from the VPS codingparameter. FIG. 29 is a flowchart showing a bit stream extractionprocess in units of the access unit in the bit stream extraction unit17″.

(SG102 b) It is determined whether the NAL unit type (nal_unit_type) ofthe target NAL unit is a parameter set or not on the basis of the“nal_unit_type” and “Name of nal_unit_type” shown in FIG. 7. Here, tosatisfy the conformance condition CZ1, which will be described below, itis determined whether the NAL unit type is “VPS_NUT” (nal_unit_type==32)or not. That is, in the case where the NAL unit type is the videoparameter set (YES in SG102 b), the processing transitions to stepSG103. In the other case (No in SG102), the processing transitions tostep SG105.

(SG10B) The bit stream extraction unit 17″ decodes the target NAL unit,which is VPS, and derives the dependence layer (dependence layer set) ofeach layer included in the target set TargetSet. More specifically,according to the procedures described in (Derivation of Reference LayerID List and Direct Reference Layer IDX List) and (Derivation of IndirectDependency Flag and Dependency Flag), the inter-layer-identifiersdependency flag LIdDependencyFlag[ ][ ] indicating whether the j-thlayer with the layer identifier nuhLId # j to the i-th layer with thelayer identifier nuhLId # i is the direct reference layer or theindirect reference layer is derived. Instead of theinter-layer-identifiers dependency flag, the dependency flagDependencyFlag[i][j] may be derived that indicates whether the j-thlayer (layer identifier nuhLId # j) to the aforementioned j-th layer(layer identifier nuhLId # i) is the direct reference layer or theindirect reference layer.

(SG105 b) It is determined whether or not the layer identifier and thetemporal identifier of the target NAL unit are included in the targetset TargetSet, or whether or not the layer identifier and the temporalidentifier of the target NAL unit are the dependence layers of the layerincluded in the target set TargetSet, on the basis of the layer ID listLayerIdListTarget of the layer constituting the target layer setLayerSetTarget, of the target highest temporal identifier and of thedependency flag (inter-layer-identifiers dependency flagLidDependencyFlag[ ][ ]). More specifically, it is determined whetherthe following conditions (1) to (3) are satisfied or not. In the casewhere at least any of the conditions is satisfied (true) (YES in SG105),the processing transitions to step SG106. In the other case (No in SG105b), the processing transitions to step SG107.

(1) In the case “a value identical to the layer identifier of the targetNAL unit is in the layer ID list LayerIdListTarget of the layerconstituting the target layer set LayerSetTarget”, it is determined tobe true. In the other case (the value identical to the layer identifierof the target NAL unit is not in the layer ID list LayerIdListTarget ofthe layer constituting the target layer set LayerSetTarget), it isdetermined to be false.

(2) In the case “the temporal identifier of the target NAL unit is equalto or less than the target highest temporal identifierHighestTidTarget”, it is determined to be true. In the other case (thetemporal identifier of the target NAL unit is greater than the targethighest temporal identifier HighestTidTarget), it is determined to befalse.

(3) It is determined whether the layer identifier nuhLayerId of thetarget NAL unit is the direct reference layer or the indirect referencelayer to each layer (LayerIdListTarget[k] (k=0 . . . n−1, (n is thenumber of layers included in LayerSetTarget)) included in the targetlayer set LayerSetTarget on the basis of the inter-layer-identifiersdependency flag LIdDependencyFLag[LayerIdListTarget[k]][nuhLayerId] (k=0. . . n−1).

More specifically, on any of the layers k included in the target layerset LayerSetTarget, in the case where the value of theinter-layer-identifiers dependency flagLIdDependencyFLag[LayerIdListTarget[k]][nuhLayerId] is one, it isdetermined to be true. In the other cases (on all the layers k includedin the target layer set LayerSetTarget, the value of theinter-layer-identifiers dependency flagLIdDependencyFLag[[LayerIdListTarget[k]][nuhLayerId] is zero), it isdetermined to be false. The determination may be based on DepFlag, whichis also derived by the following expression. That is, in the case whereDepFlag, which is the logical sum of the inter-layer-identifiersdependency flag LIdDepedencyFlag[LayerIdListTarget[k]][nuhLayerId] (k=0. . . n−1) is one, it is determined to be true. In the case whereDepFlag is zero, it is determined to be false.

DepFlag=0; for(k=0;i<k;i++){DepFlag|=LIdDependencyFLag[LayerIdListTarget[k]][nuhLayerId];}

(SG106 b) The target NAL unit is discarded. That is, as the target NALunit is not included in the target set TargetSet or the dependence layerof the target set TargetSet, the bit stream extraction unit 17 removesthe target NAL unit from the input hierarchically coded data DATA.

The operation of the bit stream extraction unit 17″ according toVariation Example 2 has thus been described above. The steps are notlimited to the above steps. Alternatively, the steps may be changed inan implementable range.

The bit stream extraction unit 17″ according to Variation Example 2,which has been described above, is characterized by performing the bitstream extraction process on the basis of the layer ID listLayerIdListTarget on the layer constituting the target layer setLayerSetTarget supplied from the outside, of the target highest temporalidentifier HighestTidTarget and of the dependence layer information (thedependency flag (LIdDependencyFlag[ ][ ] or DependencyFLag[ ][ ]))derived from the VPS, removing (discarding), from the inputhierarchically coded data DATA, the target highest temporal identifierHighestTidTarget, an aggregation target set TargetSet defined by thelayer ID list LayerIdListTarget of the target layer set LayerSetTarget,and NAL units that are not included in the dependence layer of thetarget set TargetSet, and thus extracting and outputting the target setTargetSet and the target layer set coded data DATA # T(BitstreamToDecode) made up of NAL units included in the dependencelayer of the target set TargetSet. In other words, the bit streamextraction unit 17″ does not discard the NAL unit contained on thedependence layer of the target set TargetSet, and includes the NAL unitcontained on the dependence layer into the bit stream of the target setTargetSet.

The operation of the bit stream extraction unit 17″ assumes “the AUconstituting the input hierarchically coded data DATA includes one (atthe maximum) VPS having the lowest layer identifier in the AU”. However,the operation is not limited thereto. For example, a VPS having a layeridentifier other than the lowest layer identifier in the AU may beincluded in the AU. In this case, the bit stream extraction unit 17″ mayregard the lowest layer identifier among the layer identifiers that arenot included in the target set TargetSet, as the VPS from which thedependence information is derived in steps SG102 b to SG10B. Typically,since the VPS having the layer identifier “nuhLayerId=0” is the VPShaving the lowest layer identifier, the layer dependence information maybe derived from this VPS, and the other VPSs that are not included inthe target set TargetSet are discarded.

The aforementioned bit stream extraction unit 17″ according to thisembodiment can prevent the problem in that in the bit stream after bitstream extraction, the layer set does not include the VCL and non-VCLNAL units pertaining to the dependence layer referred to by the layer inthe layer set (the direct reference layer or the indirect referencelayer). That is, it can prevent occurrence of the layer that cannot bedecoded on the bit stream that is generated through the bit streamextraction process from the bit stream on a certain layer set and onlyincludes the layer set of a sub-set of the layer set.

(Bit Stream Constraint According to Variation Example 2 of Bit StreamExtraction Unit 17)

To perform bit stream extraction described with reference to the bitstream extraction unit 17″ according to Variation Example 2, the bitstream is required to satisfy at least the following condition CZ1 as abit stream conformance.

CZ1: “The target set TargetSet (layer set) includes the dependence layerwhich each layer in the target set TargetSet depends on (refers to)”.

In other words, this bit stream constraint CZ1 is “the dependence layerto which a certain target layer in the layer set refers is included inthe same layer set.”

“In the layer set, the dependence layer to which a certain target layerin the layer set refers is included in the same layer set” means“reference is prohibited from the layer in a certain layer set B that isa subset of the layer set A to the VCL or non-VCL of the layer ‘includedin the layer set A but not included in the layer set B’, in the layerset A”. Consequently, the bit stream constraint is provided, which cansolve the problem in that in the bit stream after bit stream extraction,the VCL and non-VCL NAL units pertaining to the dependence layerreferred to from the layer in the layer set (the direct reference layeror the indirect reference layer) are not included. That is, it canprevent occurrence of the layer that cannot be decoded on the bit streamthat is generated through the bit stream extraction process from the bitstream on a certain layer set and only includes the layer set of asub-set of the layer set.

(Variation Example 3 of Bit Stream Extraction Unit 17)

Furthermore, the bit stream extraction unit 17 may be configured bycombining Variation Example 1a and Variation Example 2 of the bit streamextraction unit 17. That is, discarding, “in the case that a layeridentifier of the reference layer (direct reference layer/indirectreference layer) on which each layer in the target set TargetSet dependsis not included in the target set TargetSet, the NAL units of the VCLand non-VCL on the reference layer having, the layer identifier” may beomitted, discarding, “in case that the layer identifier nuh_layer_id=0is not included in the target set TargetSet, the NAL unit of the non-VCLincluding the parameter sets (VPS, SPS and PPS) with the layeridentifier nuh_layer_id=0” may be omitted, and the VCL and non-VLS maybe thus allowed to be included in the bit stream of the extracted targetset TargetSet. In this case, in addition to the conformance conditionCZ1, at least the conformance conditions CA1 and CA2 pertaining to theparameter sets (SPS and PPS) are required to be satisfied as bit streamconformances.

CA1: “The layer identifier of the active SPS to a certain layer A withthe layer identifier nuh_layer_id=layerIdA is zero or layerIdA, or equalto the value of the layer identifier nuh_layer_id of the directreference layer or the indirect reference layer of the layer A”

CA2: “The layer identifier of the active PPS to a certain layer A withthe layer identifier nuh_layer_id=layerIdA is zero or layerIdA, or equalto the value of the layer identifier nuh_layer_id of the directreference layer or the indirect reference layer of the layer A”

The operation of the bit stream extraction unit 17′″ according toVariation Example 3 in the case of application of the conformanceconditions CZ1 and CA1 and CA2 and the conventional condition “the VPSlayer identifier nuh_layer_id is zero” is described with reference toFIG. 30. Operations common to those of the bit stream extraction unit17″ according to Embodiment 3 are assigned the same symbols, and are notdescribed. The bit stream extraction unit 17′″ has the same function asthe VPS decoding means in the parameter set decoding unit 12 in order toderive the dependence layer from the VPS coding parameter. FIG. 30 is aflowchart showing a bit stream extraction process in units of the accessunits in the bit stream extraction unit 17′″.

(SG102 b) It is determined whether the NAL unit type (nal_unit_type) ofthe target NAL unit is a parameter set or not on the basis of the“nal_unit_type” and “Name of nal_unit_type” shown in FIG. 7. Here, tosatisfy the conformance condition CZ1, it is determined whether the NALunit type is “VPS_NUT” (nal_unit_type==32) or not. That is, in the casewhere the NAL unit type is the video parameter set (YES in SG102 b), theprocessing transitions to step SG10C. In the other case (No in SG102 b),the processing transitions to step SG10B.

(SG10B) The bit stream extraction unit 17′″ decodes the target NAL unit,which is VPS, and derives the dependence layer (dependence layer set) ofeach layer included in the target set TargetSet. The process is the sameas that in step SG10B in FIG. 29. Consequently, the description thereofis omitted.

(SG10C) It is determined whether all of the conditions (1) and (2) aresatisfied or not.

(1) It is determined whether the NAL unit type (nal_unit_type) of thetarget NAL unit is a parameter set or not on the basis of the“nal_unit_type” and “Name of nal_unit_type” shown in FIG. 7. Morespecifically, it is determined whether the NAL unit type is “VPS_NUT”(nal_unit_type==32) or “SPS_NUT” (nal_unit_type==33) or “PPS_NUT”(nal_unit_type==34). That is, in the case where the NAL unit type is anyparameter set of the VPS, SPS and PPS, it is determined to be true. Inthe other case, it is determined to be false.

(2) It is determined whether the layer identifier of the target NAL unitis zero or not. In the case where the layer identifier of the target NALunit is zero, it is determined to be true. In the other case, it isdetermined to be false.

In the case where all of the conditions (1) and (2) are true (the NALunit type of the target NAL unit is a parameter set (VPS or SPS or PPS)and the layer identifier of the target NAL unit is zero) (YES in SG10C),the processing transitions to step SG0107. In the other case (No inSG10C), the processing transitions to step SG105 b.

(SG105 b) It is determined whether or not the layer identifier and thetemporal identifier of the target NAL unit are included in the targetset TargetSet, or whether or not the layer identifier and the temporalidentifier of the target NAL unit indicate the dependence layer of thelayer included in the target set TargetSet, on the basis of the layer IDlist LayerIdListTarget of the layer constituting the target layer setLayerSetTarget, of the target highest temporal identifier and of thedependency flag (inter-layer-identifiers dependency flagLIdDependencyFlag[ ][ ]). The process is the same as that in step SG105b in FIG. 29. Consequently, the description thereof is omitted.

(SG106 b) The target NAL unit is discarded. That is, as the target NALunit is not included in the target set TargetSet or is not thedependence layer of each layer of the target set TargetSet, the bitstream extraction unit 17′″ removes the target NAL unit from the inputhierarchically coded data DATA.

The operation of the bit stream extraction unit 17′″ according toVariation Example 3 has thus been described above. The steps are notlimited to the above steps. Alternatively, the steps may be changed inan implementable range.

The bit stream extraction unit 17′″ according to Variation Example 3,which has been described above, is characterized by performing the bitstream extraction process on the basis of the layer ID listLayerIdListTarget on the layer constituting the target layer setLayerSetTarget supplied from the outside, of the target highest temporalidentifier HighestTidTarget and of the dependence layer information (thedependency flag (LIdDependencyFlag[ ][ ] or DependencyFLag[ ][ ]))derived from the VPS, removing (discarding), from the inputhierarchically coded data DATA, the target highest temporal identifierHighestTidTarget, and NAL units having the layer identifier that are notincluded in a target set TargetSet defined by the layer ID listLayerIdListTarget of the target layer set LayerSetTarget and the NALunits that do not include the layer identifier of the dependence layeron each layer in the target set TargetSet, except the NAL unitsincluding parameter sets (VPS, SPS and PPS) with the layer identifiernuh_layer_id=0, and thus extracting and outputting the target layer setcoded data DATA # T (BitstreamToDecode) made up of NAL units having thelayer identifier included in the target set TargetSet, and NAL unitshaving the layer identifier of the dependence layer on each layer in thetarget set TargetSet, and NAL units including the parameter set (VPS,SPS and PPS) with nuh_layer_id=0. In other words, the bit streamextraction unit 17′″ does not discard the NAL units having the layeridentifier of the dependence layer on each layer in the target setTargetSet, and the NAL units of the parameter set with nuh_layer_id=0,and includes the NAL units included in the dependence layer, and the NALunits of the parameter set with nuh_layer_id=0, into the bit stream ofthe target set TargetSet.

The operation of the bit stream extraction unit 17′″ assumes “the AUconstituting the input hierarchically coded data DATA includes one (atthe maximum) VPS having the lowest layer identifier in the AU”. However,the operation is not limited thereto. For example, a VPS having a layeridentifier other than the lowest layer identifier in the AU may beincluded in the AU. In this case, the bit stream extraction unit 17′″may regard the lowest layer identifier among the layer identifiers thatare not included in the target set TargetSet, as the VPS from which thelayer dependence information is derived in steps SG102 b to SG10B.Typically, since the VPS having the layer identifier “nuhLayerId=0” isthe VPS having the lowest layer identifier, the layer dependenceinformation may be derived from this VPS, and the other VPSs that arenot included in the target set TargetSet are discarded (or ignored).

The aforementioned bit stream extraction unit 17′″ according to thisembodiment can prevent the problem in that in the bit stream after bitstream extraction, the layer set does not include the VCL and non-VCLNAL units pertaining to the dependence layer referred to from the layerin the layer set (the direct reference layer or the indirect referencelayer), and the NAL unit of the parameter set (VPS/SPS/PPS) withnuh_layer_id=0. That is, it can prevent occurrence of the layer thatcannot be decoded on the bit stream that is generated through the bitstream extraction process from the bit stream on a certain layer set andonly includes the layer set of a sub-set of the layer set.

[Hierarchical Video Coding Apparatus]

The configuration of the hierarchical video coding apparatus 2 accordingto this embodiment is hereinafter described with reference to FIG. 22.

(Configuration of Hierarchical Video Coding Apparatus)

Referring to FIG. 22, the schematic configuration of the hierarchicalvideo coding apparatus 2 is described. FIG. 22 is a functional blockdiagram showing the schematic configuration of the hierarchical videocoding apparatus 2. The hierarchical video coding apparatus 2 codes aninput image PIN # T (picture) on each layer included in the layer set(target layer set), which is to be coded, and thus generates thehierarchically coded data DATA on the target layer set. That is, thevideo coding apparatus 2 codes pictures on layers in an ascending orderfrom the lowest layer ID to the highest layer ID included in the targetlayer set, and generates their coded data. In other words, the codeddata of the pictures on each layer is coded in the order of the layer IDlist LayerSetLayerIdList[0] . . . LayerSetIdList[N−1] (N is the numberof layers included in the target layer set) of the target layer set. Forthe sake of securely making the bit stream decodable by the hierarchicalvideo decoding apparatus 1 (including its variations), the hierarchicalvideo coding apparatus 2 generates the hierarchically coded data DATA ofthe target layer set so as to satisfy the aforementioned bit streamconformance CX1 (CX1′) or CX2 (CX2′) or CY1 or CY2 or (CY2 and CY3 andCY4) or CZ1 or (CZ1 and CA1 and CA2 and “the layer identifier of VPSnuh_layer_id is zero”). The hierarchically coded data DATA satisfyingthe bit stream conformance is thus generated, which can prevent, in thehierarchical decoding apparatus 1, it can prevent occurrence of thelayer that cannot be decoded on the bit stream that is generated throughthe bit stream extraction process from the bit stream on a certain layerset and only includes the layer set of a sub-set of the layer set.

As shown in FIG. 22, the hierarchical video coding apparatus 2 includesa target layer set picture encoding unit 20, and an NAL multiplexingunit 21. Furthermore, the target layer set picture encoding unit 20includes a parameter set encoding unit 22, a picture encoding unit 24, adecoded picture management unit 15, and a Coding parameter determiningunit 26.

The decoded picture management unit 15 is the same configuration elementas the decoded picture management unit 15 included in the hierarchicalvideo decoding apparatus 1, having been described above. However, thedecoded picture management unit 15 included in the hierarchical videocoding apparatus 2 is not required to output a picture recorded in aninternal DPB as an output picture. Consequently, such output may beomitted. The description of “decoding” with respect to the decodedpicture management unit 15 of the hierarchical video decoding apparatus1 is replaced with “coding”, which is applicable to the decoded picturemanagement unit 15 included in the hierarchical video coding apparatus2.

The NAL multiplexing unit 21 stores, in an NAL unit, an input VCL and anon-VCL on each layer of the target layer set to generateNAL-multiplexed hierarchical video coded data DATA # T, and outputs thedata to the outside. In other words, the NAL multiplexing unit 21 stores(codes), in the NAL unit, the non-VCL coded data and VCL coded datasupplied from the target layer set picture encoding unit 20, the NALunit types, layer identifiers and temporal identifiers that correspondto the non-VCLs and VCLs, and generates NAL-multiplexed hierarchicallycoded data DATA # T.

The Coding parameter determining unit 26 selects one set from among setsof coding parameters. The coding parameters are various parameterspertaining to the parameter sets (VPS, SPS and PPS), predictionparameters for picture coding, and parameters that are coding targetsgenerated in relation to the prediction parameters. The Coding parameterdetermining unit 26 calculates a cost value representing the magnitudeof amount of information and the coding error of each of the sets ofcoding parameters. The cost value is, for example, a sum of the amountof code and the value obtained by multiplying the square error by acoefficient λ. The amount of code is the amount of information on thecoded data on each layer of the target layer set that is obtained byvariable-length-coding the quantization error and the coding parameter.The square error is the total sum of the square values of differencevalues between an input image PIN # T and a predictive image over thepixels. The coefficient λ is a preset real number greater than zero. TheCoding parameter determining unit 26 selects a coding parameter setwhose calculated cost value is the minimum, and supplies the selectedcoding parameter set to the parameter set encoding unit 22 and thepicture encoding unit 24. The parameter set output from the Codingparameter determining unit 26 can be represented as the syntax value ofthe syntax pertaining to the parameter sets (VPS, SPS and PPS) includedin the coded data, and the set of variables derived from the syntaxvalue.

The parameter set encoding unit 22 sets the parameter sets (VPS, SPS andSPS) used to code the input image on the basis of the coding parameterof each parameter set input from the Coding parameter determining unit26 and of an input image, and supplies the NAL multiplexing unit 21 witheach parameter set, as data to be stored in the non-VCL NAL unit.Typically, the parameter set is coded on the basis of a predeterminedsyntax table. That is, the syntax value of the syntax included in thesyntax table is coded according to the procedures defined in the syntaxtable, and a bit sequence is thus generated and output as coded data.The parameter set coded by the parameter set encoding unit 22 includesthe inter-layer dependence information (the direct dependency flag,layer dependency type bit length, and layer dependency type) describedwith respect to the parameter set decoding unit 12 included in thehierarchical video decoding apparatus 1. The parameter set encoding unit22 codes the non-VCL dependency presence/absence flag as a part of thelayer dependency type. When supplying the NAL multiplexing unit 21 withthe non-VCL coded data, the parameter set encoding unit 22 adds, to thisdata, the NAL unit type, the layer identifier and the temporalidentifier that correspond to the non-VCL, and outputs the these items.

The parameter set generated by the parameter set encoding unit 22includes the identifier to identify the parameter set, and the activeparameter set identifier that identifies the parameter set (activeparameter set) to which the parameter set is referred for decodingpictures on the layers. More specifically, in the case of the videoparameter set VPS, this VPS includes the VPS identifier to identify thisVPS. In the case of the sequence parameter set SPS, this SPS includesthe SPS identifier (sps_seq_parameter_set_id) to identify this SPS, andthe active VPS identifier (sps_video_parameter_set_id) to identify theVPS to which this SPS and another syntax refer. In the case of thepicture parameter set PPS, this PPS includes the PPS identifier(pps_pic_parameter_set_id) to identify this PPS, and the active SPSidentifier (pps_seq_parameter_set_id) to identify the SPS to which thisPPS and another syntax refer.

The picture encoding unit 24 codes a part of the input image on each ofthe layers corresponding to the slices constituting the picture, on thebasis of the input image PIN # T on each layer, the parameter setsupplied by the Coding parameter determining unit 26, and the referencepicture recorded in the decoded picture management unit 15, thusgenerates the coded data on this part, and supplies the NAL multiplexingunit 21 with the data, as data to be stored in the VCL NAL unit. Thedetails of the picture encoding unit 24 are described later. Whensupplying the NAL multiplexing unit 21 with the VCL coded data, thepicture encoding unit 24 adds, to this data, the NAL unit type, thelayer identifier and the temporal identifier that correspond to the VCL,and outputs the these items.

(Picture Encoding Unit 24)

Referring to FIG. 23, the details of the configuration of the pictureencoding unit 24 are described. FIG. 23 is a functional block diagramshowing the schematic configuration of the picture encoding unit 24.

As shown in FIG. 23, the picture encoding unit 24 includes a sliceheader encoding unit 241, and a CTU encoding unit 242.

The slice header encoding unit 241 generates a slice header used to codethe input data on each layer that is input in units of slices, on thebasis of the input active parameter set. The generated slice header isoutput as a part of the slice coded data, and supplied to the CTUencoding unit 242 together with the input data. The slice headergenerated by the slice header encoding unit 241 includes the active PPSidentifier that designates the picture parameter set PPS (active PPS)referred to for decoding pictures on the layers.

The CTU encoding unit 242 codes the input image (target slice part) inunits of CTUs, on the basis of the input active parameter set and sliceheader, generates the slice data pertaining to the target slice, and thedecoded image (decoded picture), and outputs these items. Morespecifically, the CTU encoding unit 242 divides the input image in thetarget slice, in units of CTBs having a CTB size included in theparameter set, and codes the image corresponding to each CTB as one CTU.The CTU coding is performed by a predictive residue encoding unit 2421,a predictive image encoding unit 2422, and a CTU decoded imagegenerating unit 2423.

The predictive residue encoding unit 2421 outputs, as a part of theslice data included in the slice coded data, quantization residueinformation (TT information) obtained by transforming and quantizing thedifference image between the input image and the predictive image. Thepredictive residue is restored by applying inverse transformation andinverse quantization to the quantization residue information, and therestored predictive residue is output to the CTU decoded imagegenerating unit 2423.

The predictive image encoding unit 2422 generates the predictive imageon the basis of the predictive scheme of the target CTU included in thetarget slice and of the prediction parameter determined by the Codingparameter determining unit 26, and outputs the image to the predictiveresidue encoding unit 2421 and the CTU decoded image generating unit2423. Information on the predictive scheme and the prediction parameteris variable-length coded as predictive information (PT information), andoutput as a part of slice data included in the slice coded data. Thepredictive scheme selectable by the predictive image encoding unit 2422includes at least inter-layer image prediction. In the case of usinginter prediction or inter-layer image prediction, the correspondingreference picture is read from the decoded picture management unit 15.

The CTU decoded image generating unit 2423 is the same configurationelement as the CTU decoded image generating unit 1423 included in thehierarchical video decoding apparatus 1. Consequently, the descriptionthereof is omitted. The decoded image of the target CTU is supplied tothe decoded picture management unit 15, and stored in the internal DPB.

<Coding Process in Picture Encoding Unit 24>

Hereinafter, referring to FIG. 24, the schematic operation of thepicture coding on the target layer i in the picture encoding unit 24 isdescribed. FIG. 24 is a flowchart showing the coding process in units ofslices that constitute the picture on the target layer i in the pictureencoding unit 24.

(SE101) The leading slice flag of the coding target slice(first_slice_segment_in_pic_flag) is coded. That is, when a piece of theinput image divided into units of slices (hereinafter, the coding targetslice) is the leading slice in a coding order (decoding order)(hereinafter, processing order) in the picture, the leading slice flag(first_slice_segment_in_pic_flag) is one. When the coding target sliceis not the leading slice, the leading slice flag is zero. When theleading slice flag is one, the leading CTU address of the coding targetslice is set to zero. Furthermore, the counter numCtu of the number ofprocessed CTUs in the picture is set to zero. When the leading sliceflag is zero, the leading CTU address of the coding target slice is seton the basis of the slice address coded in the SE106, which will bedescribed later.

(SE102) The active PPS identifier (slice_pic_parameter_set_id) thatdesignates an active PPS referred to during decoding of the codingtarget slice is coded.

(SE104) The active parameter set determined by the coding parameterdetermining unit 26 is fetched. That is, a PPS having the PPS identifier(pps_pic_parameter_set_id) identical to the active PPS identifier(slice_pic_parameter_set_id) to which the coding target slice refers isregarded as the active PPS, and the coding parameter of the active PPSis fetched (read) from the coding parameter determining unit 26.Furthermore, an SPS having the SPS identifier (sps_seq_parameter_set_id)identical to the active SPS identifier (pps_seq_parameter_set_id) in theactive PPS is regarded as the active SPS, and the coding parameter ofthe active SPS is fetched from the coding parameter determining unit 26.Moreover, a VPS having the VPS identifier (vps_video_parameter_set_id)identical to the active VPS identifier (sps_video_parameter_set_id) inthe active SPS is regarded as the active VPS, and the coding parameterof the active VPS is fetched from the coding parameter determining unit26.

(SE105) It is determined whether the coding target slice is the leadingslice in the processing order in the picture or not on the basis of theleading slice flag. In the case where the leading slice flag is zero(YES in SE105), the processing transitions to step SE106. In the othercase (No in SE105), the process in step SE106 is skipped. In the casewhere the leading slice flag is one, the slice address of the codingtarget slice is zero.

(SE106) The slice address of the coding target flag(slice_segment_address) is coded. The slice address of the coding targetflag (the leading CUT address of the coding target flag) can be set, forexample, on the basis of the counter numCtu of the number of processedCTUs in the picture. In this case, the slice addressslice_segment_address=numCtu. That is, the leading CTU address of thecoding target flag=numCtu. The method of determining the slice addressis not limited thereto, and can be changed in an implementable range.

. . . not shown . . . .

(SE10A) The CTU encoding unit 242 codes the input image (coding targetslice) in units of CTUs, on the basis of the input active parameter setand slice header, and outputs the coded data on the CTU information(SYNSD01 in FIG. 18) as a part of the slice data of the coding targetslice. The CTU encoding unit 242 generates and outputs the CTU decodedimage of the region corresponding to each CTU. Furthermore, after thecoded data on each of the pieces of CTU information, a slice end flag(end_of_slice_segment_flag) indicating whether the CTU is the end of thecoding target slice or not (SYNSD2 in FIG. 18) is coded. In the casewhere the CTU is the end of the coding target slice, the slice end flagis set to one. In the other case, the flag is set to zero. The set valueis thus coded. After each CTU is coded, the value of the number ofprocessed CTUs numCtu is incremented by one (numCtu++).

(SE10B) It is determined whether the CTU is the end of the coding targetslice or not on the basis of the slice end flag. In the case where theslice end flag is one (YES in SE10B), the processing transitions to stepSE10C. In the other case (No in SE10B), the processing transitions toSE10A to code the subsequent CTU.

(SE10C) It is determined whether the number of processed CTUs numCtureaches the total number of CTUs (PicSizeInCtbsY) that constitute thepicture or not. That is, it is determined whether numCtu==PicSizeInCtbsYor not. In the case where numCtu is equal to PicSizeInCtbsY (YES inSE10C), the coding process in units of slices that constitute the codingtarget picture is finished. In other case (numCtu<PicSizeInCtbsY) (No inSE10C), the processing transitions to step SE101 to continue the codingprocess in units of slices that constitute the coding target picture.

The operation of the picture encoding unit 24 according to Embodiment 1has thus been described above. The steps are not limited to the abovesteps. Alternatively, the steps may be changed in an implementablerange.

(Advantageous Effects of Video Coding Apparatus 2)

For the sake of securely making the bit stream decodable by thehierarchical video decoding apparatus 1 (including its variations), thehierarchical video coding apparatus 2 according to this embodimentdescribed above generates the hierarchically coded data DATA of thetarget layer set so as to satisfy the aforementioned bit streamconformance CX1 (CX1′) or CX2 (CX2′) or CY1 or CY2 or (CY2 and CY3 andCY4) or CZ1 or (CZ1 and CA1 and CA2 and “the layer identifier of VPSnuh_layer_id is zero”). The hierarchically coded data DATA satisfyingthe bit stream conformance is thus generated, thereby allowing thehierarchical decoding apparatus 1 to prevent occurrence of the layerundecodable in the bit stream only including the layer set that isgenerated through the bit stream extraction process from the bit streamon a certain layer set and is a subset of the layer set.

The hierarchical video coding apparatus 2 according to this embodiment,described above, shares the parameter set used to code the referencelayer, as the parameter sets (VPS, SPS and PPS) used to code the targetlayer. The sharing can reduce the amount of code pertaining to theparameter set on the target layer. That is, the parameter set can becoded with a smaller amount of code.

(Application Example to Another Hierarchical Video Coding/DecodingSystem)

The aforementioned hierarchical video coding apparatus 2 and thehierarchical video decoding apparatus 1 can be used in a manner mountedon various apparatuses for video transmitting, receiving, recording andreproducing. The video may be natural video taken by a camera and thelike, and artificial video (CG and GUI) generated by a computer and thelike.

Referring to FIG. 25, it is described that the aforementionedhierarchical video coding apparatus 2 and the hierarchical videodecoding apparatus 1 can be used for video transmitting and receiving.FIG. 25(a) is a block diagram showing the configuration of atransmitting apparatus PROD_A mounted with the hierarchical video codingapparatus 2.

As described in FIG. 25(a), the transmitting apparatus PROD_A includesan encoding unit PROD_A1 that obtains coded data by coding video, amodulating unit PROD_A2 that obtains a modulated signal by modulatingcarrier waves using coded data obtained by the encoding unit PROD_A1,and a transmitting unit PROD_A3 that transmits the modulated signalobtained by the modulating unit PROD_A2. The aforementioned hierarchicalvideo coding apparatus 2 is used as the encoding unit PROD_A1.

The transmitting apparatus PROD_A may further include a camera PROD_A4that serves as a supply source of video to be input into the encodingunit PROD_A1 and takes video, a recording medium PROD_A5 that recordsvideo, an input terminal PROD_A6 for receiving video from the outside,and an image processing unit A7 that generates or processes images. InFIG. 25(a), the configuration where the transmitting apparatus PROD_A isprovided with all of these elements is illustrated. However, some ofthese may be omitted.

The recording medium PROD_A5 may record uncoded video. Alternatively,this medium may record video coded according to a coding scheme forrecording that is different from the coding scheme for transmission. Inthe latter case, it is preferred that a decoder (not shown) that decodescoded data read from the recording medium PROD_A5 according to a codingscheme for recording intervene between the recording medium PROD_A5 andthe encoding unit PROD_A1.

FIG. 25(b) is a block diagram showing the configuration of a receivingapparatus PROD_B mounted with the hierarchical video decoding apparatus1. As described in FIG. 25(b), the receiving apparatus PROD_B includes areceiving unit PROD_B1 that receives a modulated signal, a demodulatingunit PROD_B2 that obtains a coded data by demodulating the modulatedsignal received by the receiving unit PROD_B1, and a decoding unitPROD_B3 that obtains video by decoding the coded data obtained by thedemodulating unit PROD_B2. The aforementioned hierarchical videodecoding apparatus 1 is used as the decoding unit PROD_B3.

The receiving apparatus PROD_B may further include a display PROD_B4that serves as a supply destination of video to be output from thedecoding unit PROD_B3 and displays video, and a recording medium PROD_B5for recording video, and an output terminal PROD_B6 for outputting videoto the outside. In FIG. 25(b), the configuration where the receivingapparatus PROD_B is provided with all of these elements is illustrated.However, some of these may be omitted.

The recording medium PROD_B5 may be for recording uncoded video.Alternatively, this medium may record video coded according to a codingscheme for recording that is different from the coding scheme fortransmission. In the latter case, it is preferred that an encoder (notshown) that codes video obtained from the decoding unit PROD_B3according to the coding scheme for recording intervene between thedecoding unit PROD_B3 and the recording medium PROD_B5.

The transmission medium that transmits the modulated signal may bewireless or wired medium. The transmission manner that transmits themodulated signal may be broadcast (here, indicating a transmissionmanner where the transmission destination has not preliminarily beenspecified). This manner may be communication (here, indicating atransmission manner where the transmission destination has beenpreliminarily specified). That is, the transmission of the modulatedsignal may be achieved by any of wireless broadcast, wired broadcast,wireless communication, and wired communication.

For example, a broadcast station for terrestrial digital broadcast(broadcast facilities and the like)/receiving unit (television receivingunit and the like) is an example of the transmitting apparatusPROD_A/receiving apparatus PROD_B for transmitting and receiving themodulated signal through wireless broadcasting. A broadcast station forcable television broadcast (broadcast facilities and the like)/receivingunit (television receiving unit and the like) is an example of thetransmitting apparatus PROD_A/receiving apparatus PROD_B fortransmitting and receiving the modulated signal through wiredbroadcasting.

A server (workstation etc.)/client (television receiving unit, personalcomputer, smartphone, etc.) for VOD (Video On Demand) service or videosharing service using the Internet is an example of the transmittingapparatus PROD_A/receiving apparatus PROD_B for transmitting andreceiving the modulated signal through communication (typically, any ofwireless and wired transmission media is used in LAN, and a wiredtransmission medium is used in WAN). Here, the personal computer may beany of a desktop PC, a laptop PC, and a tablet PC. The smartphone may bea multi-functional mobile phone.

A client of a video sharing service has not only a function of decodingthe coded data downloaded from a server and displaying the data, butalso a function of coding video taken by a camera and uploading thevideo to the server. That is, the client of the video sharing servicefunctions as both of a transmitting apparatus PROD_A and a receivingapparatus PROD_B.

Referring to FIG. 26, it is described that the aforementionedhierarchical video coding apparatus 2 and the hierarchical videodecoding apparatus 1 can be used for video recording and reproducing.FIG. 26(a) is a block diagram showing the configuration of a recordingapparatus PROD_C mounted with the hierarchical video coding apparatus 2.

As shown in FIG. 26(a), the recording apparatus PROD_C includes anencoder PROD_C1 that obtains coded data by coding video, and a writingunit PROD_C2 that writes, in a recording medium PROD_M, the coded dataobtained by the encoder PROD_C1. The aforementioned hierarchical videocoding apparatus 2 is used as the encoder PROD_C1.

The recording medium PROD_M may be (1) what is embedded in the recordingapparatus PROD_C, such as an HDD (Hard Disk Drive) or an SSD (SolidState Drive), (2) what is connected to the recording apparatus PROD_C,such as an SD memory card or a USB (Universal Serial Bus) flash memory,(3) what is inserted in a drive apparatus (not shown) embedded in therecording apparatus PROD_C, such as a DVD (Digital Versatile Disc) or aBD (BLu-ray Disc®).

The recording apparatus PROD_C may further include a camera PROD_C3 thatserves as a supply source of video to be input into the encoder PROD_C1and takes video, an input terminal PROD_C4 for receiving video from theoutside, a receiving unit PROD_C5 for receiving video, and an imageprocessing unit C6 that generates or processes images. In FIG. 26(a),the configuration where the recording apparatus PROD_C is provided withall of these elements is illustrated. However, some of these may beomitted.

The receiving unit PROD_C5 may be for receiving uncoded video.Alternatively, this receiving unit may receive coded data codedaccording to a coding scheme for transmitting that is different from thecoding scheme for recording. In the latter case, it is preferred that adecoder for transmission (not shown) that decodes coded data codedaccording to a coding scheme for transmission intervene between thereceiving unit PROD_C5 and the encoder PROD_C1.

Examples of such a recording apparatus PROD_C include a DVD recorder, aBD recorder, and an HDD (Hard Disk Drive) recorder (in this case, theinput terminal PROD_C4 or the receiving unit PROD_C5 serves as a mainsupply source of video). Alternatively, a camcorder (in this case, thecamera PROD_C3 serves as a main supply source of video), a personalcomputer (in this case, the receiving unit PROD_C5 or the imageprocessing unit PROD_C6 serves as a main supply source of video), asmartphone (in this case, the camera PROD_C3 or the receiving unitPROD_C5 serves as a main supply source of video) are examples of such arecording apparatus PROD_C.

FIG. 26(b) is a block diagram showing the configuration of a reproducingapparatus PROD_D mounted with the aforementioned hierarchical videodecoding apparatus 1. As shown in FIG. 26(b), the reproducing apparatusPROD_D includes a reading unit PROD_D1 that reads coded data written inthe recording medium PROD_M, and a decoding unit PROD_D2 that obtainsvideo by decoding the coded data read by the reading unit PROD_D1. Theaforementioned hierarchical video decoding apparatus 1 is used as thedecoding unit PROD_D2.

The recording medium PROD_M may be (1) what is embedded in thereproducing apparatus PROD_D, such as an HDD or an SSD, (2) what isconnected to the reproducing apparatus PROD_D, such as an SD memory cardor a USB flash memory, (3) what is inserted in a drive apparatus (notshown) embedded in the reproducing apparatus PROD_D, such as a DVD or aBD.

The reproducing apparatus PROD_D may further include a display PROD_D3that serves as a supply destination of video to be output from thedecoding unit PROD_D2 and displays video, and an output terminal PROD_D4for outputting the video to the outside, and a transmitting unit PROD_D5that transmits the video. In FIG. 26(b), the configuration where thereproducing apparatus PROD_D is provided with all of these elements isillustrated. However, some of these may be omitted.

The transmitting unit PROD_D5 may be for transmitting uncoded video.Alternatively, this transmitting unit may transmit coded data codedaccording to a coding scheme for transmitting that is different from thecoding scheme for recording. In the latter case, it is preferred that anencoder (not shown) that codes video according to a coding scheme fortransmission intervene between the decoding unit PROD_D2 and thetransmitting unit PROD_D5.

Such a reproducing apparatus PROD_D may be, for example, a DVD player, aBD player, an HDD player or the like (in this case, the output terminalPROD_D4 to which a television receiving unit or the like is connectedserves as a main supply destination of video). A television receivingunit (in this case, the display PROD_D3 serves as a main supplydestination of video), a digital signage (also called an electronicsignage or electronic bulletin board, and the display PROD_D3 or thetransmitting unit PROD_D5 serves as a main supply destination of video),a desktop PC (in this case, the output terminal PROD_D4 or thetransmitting unit PROD_D5 serves as a main supply destination of video),a laptop or tablet PC (in this case, the display PROD_D3 or thetransmitting unit PROD_D5 serves as a main supply destination of video),a smartphone (in this case, the display PROD_D3 or the transmitting unitPROD_D5 serves as a main supply destination of video) and the like areexamples of such a reproducing apparatus PROD_D.

(On Achievement into Hardware and Achievement into Software)

Finally, each of the blocks of the hierarchical video decoding apparatus1 and the hierarchical video coding apparatus 2 may be achieved by alogic circuit formed on an integrated circuit (IC chip) in a hardwaremanner, or achieved in a software manner using a CPU (Central ProcessingUnit).

In the latter case, each of the apparatuses includes a CPU that executesinstructions of control programs that achieve functions, ROM (Read OnlyMemory) that stores the programs, RAM (Random Access Memory) on whichthe program are deployed, and a storing apparatus (recording medium),such as memory, which stores the programs and various data. The objectof the present disclosure can be achieved also by supplying the each ofthe apparatuses with a recording medium that records the program code(executable programs, intermediate code programs, source programs) ofcontrol programs, which are software for achieving the aforementionedfunctions, in each of the apparatuses in a computer-readable manner, andby causing the computer (CPU or MPU (Micro Processing Unit)) to read theprogram code recorded in the recording medium.

The recording medium may be, for example, tape, such as magnetic tape orcassette tape, disks including a magnetic disk, such as Floppy®disk/hard disk, and an optical disk, such as CD-ROM (Compact DiscRead-Only Memory)/MO (Magneto-Optical)/MD (Mini Disc)/DVD (DigitalVersatile Disk)/CD-R (CD Recordable), cards, such as an IC card(including a memory card)/optical card, semiconductor memories, such asmask ROM/EPROM (Erasable Programmable Read-only Memory)/EEPROM®(Electrically Erasable and Programmable Read-only Memory)/flash ROM, orlogic circuits including PLD (Programmable Logic Apparatus) or FPGA(Field Programmable Gate Array).

Each of the apparatuses may be configured to be connectable to acommunication network, and supply the program code via the communicationnetwork. The communication network is any element that can transmit theprogram code. The element is not limited. For example, the Internet, anintranet, an extranet, LAN (Local Area Network), ISDN (IntegratedServices Digital Network), VAN (Value-Added Network), CATV (CommunityAntenna Television) communication network, Virtual Private Network,telephone network, mobile communication network, satellite communicationnetwork and the like can be used. The transmission medium constitutingthe communication network may be any medium that can transmit theprogram code. The medium is not limited to a specific configuration ortype. For example, any of wired elements, such as IEEE (Institute ofElectrical and Electronic Engineers) 1394, USB, power-line carrier,cable TV line, telephone line, ADSL (Asymmetric Digital Subscriber Line)circuit, or any of wireless elements that include an infrared element,such as IrDA (Infrared Data Association) or a remote control,Bluetooth®, IEEE802.11 wireless, HDR (High Data Rate), NFC (Near FieldCommunication), DLNA® (Digital Living Network Alliance), mobile phonenetwork, satellite circuit, or terrestrial digital network can be used.The present disclosure may be achieved in a form of a computer datasignal embedded in carrier waves embodied through electronictransmission of the program code.

CONCLUSION

An image decoding apparatus according to aspect 1 of the presentdisclosure is an image decoding apparatus that decodes input image codeddata, including: an image-coded data extractor that extracts image codeddata pertaining to a decoding target layer set including at least onelayer, from the input image coded data, based on a layer ID listindicating the decoding target layer set; and a picture decoding unitthat decodes a picture in the decoding target layer set, from theextracted image coded data, wherein the input image coded data extractedby the image-coded data extractor does not include a non-VCL NAL unithaving a layer identifier that is not equal to zero and is not includedin the layer ID list.

The image decoding apparatus of aspect 2 of the present disclosure isthat according to the aspect 1, wherein a temporal ID of an NAL unitincluded in the image coded data is equal to or less than a value of ahighest temporal ID of the decoding target layer set.

The image decoding apparatus of aspect 3 of the present disclosure isthat according to aspect 1, wherein the non-VCL NAL unit is an NAL unithaving a parameter set.

The image decoding apparatus of aspect 4 of the present disclosure isthat according to aspect 2, wherein the parameter set includes a videoparameter set.

The image decoding apparatus of aspect 5 of the present disclosure isthat according to aspect 3, wherein the parameter set includes asequence parameter set.

The image decoding apparatus of aspect 6 of the present disclosure isthat according to aspect 3, wherein the parameter set includes a pictureparameter set.

An image decoding method of aspect 7 of the present disclosure is animage decoding method of decoding input image coded data, comprising: animage-coded data extracting step of extracting image coded datapertaining to a decoding target layer set including at least one layer,from the input image coded data, based on a layer ID list indicating thedecoding target layer set; and a picture decoding step of decoding apicture in the decoding target layer set, from the extracted image codeddata, wherein the input image coded data extracted in the image-codeddata extracting step does not include a non-VCL NAL unit having a layeridentifier that is not equal to zero and is not included in the layer IDlist.

An image decoding apparatus of aspect 8 of the present disclosure is animage decoding apparatus that includes an image-coded data extractor forextracting decoding target image coded data from input image coded data,based on a layer ID list of a target layer set; wherein the image-codeddata extractor further includes a layer identifier updating unit thatupdates a layer identifier of a non-video coding layer NAL unit, whichis in the input image coded data and has a smaller layer identifier thanthe lowest layer identifier in the layer ID list of the target layerset, with the lowest layer identifier; the image-coded data extractordiscards the NAL unit having a layer identifier not included in thelayer ID list of the target layer set, from the image coded dataincluding the non-video coding layer NAL unit where the layer identifieris updated by the layer identifier updating unit, and generates decodingtarget image coded data.

The image decoding apparatus described above can prevent the problem inthat the NAL unit on the non-video coding layer is not included in thelayer set on the bit stream after bit stream extraction. That is, it canprevent occurrence of the layer that cannot be decoded on the bit streamthat is generated through the bit stream extraction process from the bitstream on a certain layer set and only includes the layer set of asub-set of the layer set.

An image decoding apparatus of aspect 9 of the present disclosure is animage decoding apparatus that includes an image-coded data extractor forextracting decoding target image coded data from input image coded data,based on a layer ID list of a target layer set, wherein the image-codeddata extractor discards an NAL unit having a layer identifier notincluded in the layer ID list of the target layer set, from the inputimage coded data, except an NAL unit with a parameter set having a layeridentifier of zero.

The image decoding apparatus described above can prevent the problem inthat the NAL unit of the parameter set having a layer identifier of zerois not included in the layer set on the bit stream after bit streamextraction. That is, it can prevent occurrence of the layer that cannotbe decoded on the bit stream that is generated through the bit streamextraction process from the bit stream on a certain layer set and onlyincludes the layer set of a sub-set of the layer set.

An image decoding apparatus of aspect 10 of the present disclosure is animage decoding apparatus that includes an image-coded data extractor forextracting decoding target image coded data from input image coded data,based on a layer ID list of a target layer set, wherein the image-codeddata extractor further includes a dependence layer information derivingunit that derives dependence information on which each layer included inthe target layer set in the input image coded data depends, theimage-coded data extractor discards an NAL unit having a layeridentifier not included in the layer ID list of the target layer setfrom the input image coded data, except an NAL unit on the dependencelayer derived by the dependence layer information deriving unit, andgenerates decoded target image coded data.

The image decoding apparatus described above can prevent the problem inthat the NAL unit on the dependence layer on which each layer includedin the target layer set depends is not included in the layer set on thebit stream after bit stream extraction. That is, it can preventoccurrence of the layer that cannot be decoded on the bit stream that isgenerated through the bit stream extraction process from the bit streamon a certain layer set and only includes the layer set of a sub-set ofthe layer set.

The image decoding apparatus of aspect 11 of the present disclosure isthat according to the aspect 8, wherein the non-video coding layer NALunit is an NAL unit including a parameter set.

The image decoding apparatus described above can prevent the problem inthat the NAL unit of the parameter set is not included in the layer seton the bit stream after bit stream extraction.

The image decoding apparatus of aspect 12 of the present disclosure isthat according to aspect 9 or 11, wherein the parameter set is a videoparameter set.

The image decoding apparatus described above can prevent the problem inthat the NAL unit of the video parameter set is not included in thelayer set on the bit stream after bit stream extraction.

The image decoding apparatus of aspect 13 of the present disclosure isthat according to aspect 9 or 11, wherein the parameter set is asequence parameter set.

The image decoding apparatus described above can prevent the problem inthat the NAL unit of the sequence parameter set is not included in thelayer set on the bit stream after bit stream extraction.

The image decoding apparatus of aspect 14 of the present disclosure isthat according to aspect 9 or 11, wherein the parameter set is a pictureparameter set.

The image decoding apparatus described above can prevent the problem inthat the NAL unit of the picture parameter set is not included in thelayer set on the bit stream after bit stream extraction.

The image decoding apparatus of aspect 15 of the present disclosure isthat according to aspect 10, wherein the dependence layer is a directreference layer or an indirect reference layer.

The image decoding apparatus described above can prevent the problem inthat the NAL unit on the direct reference layer or the indirectreference layer on which each layer included in the target layer setdepends is not included in the layer set on the bit stream after bitstream extraction.

Image coded data of aspect 16 of the present disclosure is image codeddata satisfying a conformance condition that a video parameter setreferred to by a certain target layer is a same layer identifier as VCLhaving a lowest layer identifier in an access unit including a targetlayer.

The image coded data described above can prevent the problem in that inthe sub-bit-stream generated through bit stream extraction, no videoparameter set is included in the layer set. That is, it can preventoccurrence of the layer that cannot be decoded on the bit stream that isgenerated through the bit stream extraction process from the bit streamon a certain layer set and only includes the layer set of a sub-set ofthe layer set.

Image coded data of aspect 17 of the present disclosure is image codeddata satisfying a conformance condition that a video parameter setreferred to by a certain target layer has a layer identifier value ofzero.

The image coded data described above can prevent the problem in that inthe sub-bit-stream generated through bit stream extraction, no videoparameter set having a layer identifier of zero is included in the layerset. That is, it can prevent occurrence of the layer that cannot bedecoded on the bit stream that is generated through the bit streamextraction process from the bit stream on a certain layer set and onlyincludes the layer set of a sub-set of the layer set.

The image coded data of aspect 18 of the present disclosure according tothat of aspect 17, further satisfying a conformance condition that asequence parameter set referred to by a certain target layer has a layeridentifier value of zero.

The image coded data described above can prevent the problem in that inthe sub-bit-stream generated through bit stream extraction, no sequenceparameter set having a layer identifier value of zero is included in thelayer set.

The image coded data of aspect 19 of the present disclosure according tothat of aspect 17 or 18, further satisfying a conformance condition thata picture parameter set referred to by a certain target layer has alayer identifier value of zero.

The image coded data described above can prevent the problem in that inthe sub-bit-stream generated through bit stream extraction, no pictureparameter set having a layer identifier value of zero is included in thelayer set.

The image coded data of aspect 20 of the present disclosure is imagecoded data satisfying a conformance condition that the layer setincludes a dependence layer referred to by a certain target layer in thelayer set.

The image coded data described above can prevent the problem in that inthe sub-bit-stream generated through bit stream extraction, nodependence layer referred to by a certain target layer in the layer setis included in the layer set. That is, it can prevent occurrence of thelayer that cannot be decoded on the bit stream that is generated throughthe bit stream extraction process from the bit stream on a certain layerset and only includes the layer set of a sub-set of the layer set.

An image coding apparatus of aspect 21 of the present disclosure is animage coding apparatus for generating image coded data from input layerimage corresponding to a target layer set, based on a layer ID list ofthe target layer set, wherein the image coding apparatus generates imagecoded data satisfying a conformance condition that in the target layerset, a layer identifier of a non-video coding layer referred to by acertain target layer is a layer identifier identical to a VCL having alowest layer identifier in an access unit of the target layer set.

The image coding apparatus described above can prevent the problem inthat the NAL unit on the non-video coding layer referred to by a certaintarget layer is not included in the sub-bit-stream generated through bitstream extraction from the image coded data generated by the imagecoding apparatus. That is, it can prevent occurrence of the layer thatcannot be decoded on the bit stream that is generated through the bitstream extraction process from the bit stream on a certain layer set andonly includes the layer set of a sub-set of the layer set.

An image coding apparatus of aspect 22 of the present disclosure is animage coding apparatus for generating image coded data from input layerimage corresponding to a target layer set, based on a layer ID list ofthe target layer set, wherein the image coding apparatus generates imagecoded data satisfying a conformance condition that in the target layerset, a layer identifier of a non-video coding layer referred to by acertain target layer is a lowest layer identifier in the layer ID listof the target layer set.

The image coding apparatus described above can prevent the problem inthat the NAL unit on the non-video coding layer referred to by a certaintarget layer is not included in the sub-bit-stream generated through bitstream extraction from the image coded data generated by the imagecoding apparatus. That is, it can prevent occurrence of the layer thatcannot be decoded on the bit stream that is generated through the bitstream extraction process from the bit stream on a certain layer set andonly includes the layer set of a sub-set of the layer set.

An image coding apparatus of aspect 23 of the present disclosure is animage coding apparatus for generating image coded data from input layerimage corresponding to a target layer set, based on a layer ID list ofthe target layer set, wherein the image coding apparatus generates imagecoded data satisfying a conformance condition that in the target layerset, a dependence layer on which each layer depends is included in thetarget layer set.

The image coding apparatus described above can prevent the problem inthat the NAL unit on the dependence layer referred to by a certaintarget layer is not included in the sub-bit-stream generated through bitstream extraction from the image coded data generated by the imagecoding apparatus. That is, it can prevent occurrence of the layer thatcannot be decoded on the bit stream that is generated through the bitstream extraction process from the bit stream on a certain layer set andonly includes the layer set of a sub-set of the layer set.

The present disclosure is not limited to each embodiment describedabove. Various changes can be made in a range represented in the claims.Any embodiment obtained by combining types of technical measuresdisclosed in various embodiments are also included in the technicalscope of the present disclosure.

INDUSTRIAL APPLICABILITY

The present disclosure is suitably applicable to a hierarchical videodecoding apparatus that decodes coded data where image data ishierarchically coded, and a hierarchical video coding apparatus thatgenerates the coded data where image data is hierarchically coded. Thepresent disclosure is also suitably applicable to the data structure ofhierarchically coded data generated by the hierarchical video codingapparatus and referred to by the hierarchical video decoding apparatus.

DESCRIPTION OF SYMBOLS

-   -   1 . . . Hierarchical video decoding apparatus    -   2 . . . Hierarchical video coding apparatus    -   10 . . . Target layer set picture decoding unit    -   11 . . . NAL demultiplexing unit    -   12 . . . Parameter set decoding unit    -   13 . . . Parameter set management unit    -   14 . . . Picture decoding unit    -   141 . . . Slice header decoding unit    -   142 . . . CTU decoding unit    -   1421 . . . Predictive residue restoring unit    -   1422 . . . Predictive image generating unit    -   1423 . . . CTU decoded image generating unit    -   15 . . . Decoded picture management unit    -   17 . . . Bit stream extraction unit (image-coded data extractor)    -   20 . . . Target layer set picture encoding unit    -   21 . . . NAL multiplexing unit    -   22 . . . Parameter set encoding unit    -   24 . . . Picture encoding unit    -   26 . . . Coding parameter determining unit    -   241 . . . Slice header encoding unit    -   242 . . . CTU encoding unit    -   2421 . . . Predictive residue encoding unit    -   2422 . . . Predictive image encoding unit    -   2423 . . . CTU decoded image generating unit

1. An image decoding apparatus for decoding input image coded data,comprising: an image-coded data extractor that extracts image coded datapertaining to a decoding target layer set including at least one layer,from the input image coded data, based on a layer ID list indicating thedecoding target layer set, and, in the case where a layer identifier ofa non-VCL NAL unit is not included in the decoding target layer set,rewrites the layer identifier of the non-VCL NAL unit to the lowestlayer identifier in the decoding target layer set; and a picturedecoding unit that decodes a picture in the decoding target layer set,from the extracted image coded data, wherein a non-VCL NAL unit having alayer identifier that is not equal to the lowest layer identifier in anaccess unit and is not included in the layer ID list is discarded fromthe input image coded data.
 2. The image decoding apparatus according toclaim 1, wherein a temporal ID of an NAL unit included in the imagecoded data is equal to or less than a value of a highest temporal ID ofthe decoding target layer set.
 3. The image decoding apparatus accordingto claim 1, wherein the non-VCL NAL unit is an NAL unit having aparameter set.
 4. The image decoding apparatus according to claim 3,wherein the parameter set includes a video parameter set.
 5. The imagedecoding apparatus according to claim 3, wherein the parameter setincludes a sequence parameter set.
 6. The image decoding apparatusaccording to claim 3, wherein the parameter set includes a pictureparameter set.
 7. An image decoding method of decoding input image codeddata, comprising: an image-coded data extracting step of extractingimage coded data pertaining to a decoding target layer set including atleast one layer, from the input image coded data, based on a layer IDlist indicating the decoding target layer set, and, in the case where alayer identifier of a non-VCL NAL unit is not included in the decodingtarget layer set, rewriting the layer identifier of the non-VCL NAL unitto the lowest layer identifier in the decoding target layer set; and apicture decoding step of decoding a picture in the decoding target layerset, from the extracted image coded data, wherein a non-VCL NAL unithaving a layer identifier that is not equal to the lowest layeridentifier in an access unit and is not included in the layer ID list isdiscarded from the input image coded data.